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Executive  Summary 

This  final  report  summarizes  major  project  activities  and  results  achieved  during  last  three  years.  This  project  has 
involved  a  great  amount  of  theoretical  and  development  work.  The  primary  goal  is  to  demonstrate  an  SNR  gain 
without  using  multiple  antennas,  so  that  transmission  range  can  be  increased.  This  goal  has  been  reached  by  using 
UWB  time  reversal  on  our  real-time  testbed.  Propelled  by  the  project,  our  overall  research  has  harvested  in  a 
number  of  aspects.  The  first  UWB  time  reversal  system  with  real-time  waveform  precoding  at  the  transmitter  was 
demonstrated  in  our  lab  in  year  2009.  The  original  time  reversal  technique  has  been  generalized  by  transmitting  an 
optimization  based  waveform,  considering  implementation  limitations.  Tremendous  experience  in  real-time  system 
design  and  implementation  has  been  gained  throughout  the  development  cycle.  This  report  is  organized  as  follows. 
An  overall  project  review  and  system  description  are  provided  in  chapter  1  and  2.  Major  achievements  and  findings, 
including  development  and  theoretical  work,  are  given  in  chapter  3  and  4.  In  chapter  5  and  6,  some  current  and  future 
work  influenced  by  this  project  are  described,  followed  by  an  appendix  to  include  a  few  most  recent  test  results. 
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Chapter  1 


Project  Review 


The  major  goal  of  this  project  is  to  develop  a  true  UWB  radio  with  time  reversal  precoding  at  the  transmitter. 
Progress  has  been  made  gradually  over  last  three  years.  Listed  in  Table  1.1  are  milestones  achieved.  The  first  UWB 
time  reversal  system  with  real-time  waveform  precoding  at  the  transmitter  was  demonstrated  in  our  lab  in  year  2009. 
About  4-dB  SNR  gain  was  achieved  in  an  indoor  multiroom  environment.  An  arbitrary  waveform  generator  with 
14-bit  resolution  and  500-MHz  bandwidth  has  been  built  using  commercial  ADC  and  FPGA  board.  A  unique  3- 
stagc  burst  mode  synchronization  scheme  has  been  implemented  in  FPGA.  In  addition  to  our  development  work, 
this  project  has  also  driven  theoretical  research  efforts.  Among  many  byproducts  harvested  from  the  project  is  the 
radio  testbed.  Now  it  is  playing  an  important  role  in  exploring  multichannel  front-ends  and  testing  algorithms  in  real 
time.  This  project  is  benefiting  and  will  continue  to  benefit  our  research  work. 


Table  1.1:  Milestones  Achieved 


Stable  Architecture 

3rd  quarter  2007 

Arbitrary  Waveform  Generating 

1st  quarter  2008 

Time  Reversal  System  (baseline) 

2nd  quarter  2008 

Full-Function  Time  Reversal 

1  St  quarter  2009 

Performance  Test  and  Trials 

3rd  quarter  2009 

System  Improvement  &  Function  Extension 

4th  quarter  2009 
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CHAPTER  I.  PROJECT  REVIEW 


Chapter  2 


Testbed  System  Description 


A  general  purpose  Time-Reversal  UWB  transceiver  testbed  for  communication  and  ranging  was  developed.  The 
main  goal  is  to  implement  a  pair  of  transmitter  and  receiver  to  verify  our  Time-Reversal  conceptual-proof  schemes. 
The  design  is  based  on  energy  (square  law)  detection  reception  technology.  On  April  14,  2008, we  demonstrated  the 
very  first  time  reversal  UWB  radio  in  our  lab.  On  April  2009,  we  improved  the  system,  from  demonstration  it  has 
been  found  that  time  reversal  precoding  can  provide  3  to  4  dB  gain  over  an  ordinary  single-carrier  UWB  system  in 
indoor  multipath  environments  with  non-line-of-sight  (NLOS).  The  major  parameters  of  the  testbed  under  test  are 
as  follows. 

•  10-dB  Bandwidth:  400  to  800  MHz 


•  Chip  rate:  25  Mc/s 

•  Bit  rate:  6.25,  3.125,  1.5625  Mb/s 

•  Synchronization  method  specially  designed  for  burst  mode  transmission 

•  Digital  modulation:  on/off  keying  (OOK) 

•  Receiver  sensitivity:  -81  dBm 

•  Waveform  Generator  with  8-bit  quantization  and  1-GHz  sampling 

•  I/Q  quadrature  frequency  up-conversion 

•  Energy  detector  with  different  integration  window  sizes 
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CHAPTER  2.  TESTBED  SYSTEM  DESCRIPTION 


•  Adjustable  and  adaptive  thresholds  in  synchronization  and  demodulation  phases 
This  testbed  design  has  a  few  advantages: 

•  Passband  I/Q  frequency  up-conversion  flexible  for  spectrum  arrangement 


•  The  received  waveform  envelope  is  optimized  by  adjusting  a  baseband  complex  FIR  filter  at  the  transmitter 

•  Virtually  arbitrary  waveform  generating,  taking  full  advantage  of  channel  information. 

•  OOK  modulation  with  simple  energy  detector  receiver  robust  to  timing  error 

•  Analog-to-digital  converter  (ADC)  and  digital  processing  at  the  receiver  able  to  test  various  algorithms 

•  The  selected  spectrum  is  very  clean  and  the  channels  are  very  stable,  so  that  system  tests  and  trials  are  conve¬ 
nient. 


Pulse  based  signaling  and  transmitter-side  processing  arc  adopted  as  system  design  guideline.  Although  direct  pulse 
(carrierless)  transmission  can  largely  reduce  complexity  of  transmitter  RF  front-end,  it  is  not  a  good  choice  for 
multi-purpose  radio  testbed  mainly  because  of  its  inflexibility.  As  a  matter  of  fact,  a  modulated  pulse  is  not  only 
easy  to  generate  but  also  more  flexible:  the  center  frequency  is  determined  by  a  local  oscillator  and  the  spectral  shape 
is  governed  by  the  baseband  pulse.  Conceptual  testbed  architecture  is  shown  in  Fig.  2.1.  Following  an  FIR  filter 
(embedded  in  the  FPGA),  the  digital-to-analog  converter  (DAC)  outputs  desired  analog  waveforms.  The  simple- 
receiver  philosophy  is  reflected  in  this  testbed  with  OOK  modulation  and  diode  based  non-coherent  detector  at  the 
receiver.  At  the  receiver,  demodulation  is  done  in  digital  domain,  so  that  algorithms  and  parameters  can  be  adjusted 
easily.  Of  course,  the  analog-to-digital  converter  (ADC)  is  power  hungry  and  it  may  be  replaced  by  some  substitute 
circuits  in  commercialized  products  in  the  future. 

Noticeable  temporal  focusing  has  been  seen  and  successful  data  transmission  has  been  experimentally  demonstrated 
in  this  some  how  harsh  environment.  Double  transmission  distance  involves  a  lot  of  challenging  jobs  including 
transplanting  the  receiver  back-end  from  the  Virtcx-2  platform  to  the  Virtex-5  platform,  increase  of  the  prefilter 
length,  and  solving  the  dynamic  range  problem,  etc.  At  current  stage,  the  receiver  back-end  is  powered  by  Virtex-5 
platform  and  we  have  modified  the  system  to  double  transmission  distance. 


2.1  Transmitter 

At  the  testbed’s  transmitter  side,  there  are  mainly  five  parts:  Xilinx  Virtex-5  LXT  Prototype  Platform,  Fujitsu 
DK86064  DAC  Evaluation  Kit,  TRF3703-15  Quadrature  Modulator  Evaluation  Module,  PSA4000A  Local  Oscilla¬ 
tor  Evaluation  Board  and  Mini-Circuits  ZVE-8G  Amplifier.  Transmitter  Architecture  is  shown  as  Fig.  2.2.  Baseband 


2.J.  TRANSMITTER 
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Transmitter 


Receiver 


Figure  2.1:  Overall  testbed  architecture. 


Figure  2.2:  The  architecture  of  testbed  transmitter 
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CHAPTER  2.  TESTBED  SYSTEM  DESCRIPTION 


Table  2. 1 :  A  List  of  Selected  Components/Modules  for  Transmitter 


FPGA  back-end 

Virtex-5  LXT 

Xinlix 

65  nm/1.0  V  core/550  MHz/17,280  slices/680  User  I/O: 

DAC 

MB86064 

Fujitsu 

Dual  14-bit  1  GSa/s 

Local  oscillator 

PSA4000A 

Z-communications 

Center  frequency  4.0  GHz 

Modulator  (up-converter) 

TRF3703-15 

Texas  Instruments 

Direct  quadrature  modulator  400  MHz  to  4  GHz 

Amplifier 

ZVE-8G 

Mini-circuits 

Wideband(2  to  8  GHz)/  low  noise(  4  dB  typ)/35dB  gain 

signals  from  DAC  are  converted  to  passband  signal  by  modulator  and  then  be  sent  to  the  amplifier,  the  output  sig¬ 
nal  of  amplifier  goes  to  the  antenna  and  then  be  transmitted  through  the  air.  major  components  and  modules  for 
transmitter  side  are  listed  in  Table  2. 1 . 

Discrete-time  waveform  is  generated  by  the  waveform  generator  module  in  FPGA  based  on  chip  value,  scrambling 
code  and  the  pre-loaded  waveform  template,the  discrete-time  waveform  is  then  fed  to  the  DAC  via  the  high-speed 
connection  buses.  The  waveform  generator  will  be  described  in  detail  in  chapter  3. 

Local  Oscillator  PSA4000A  is  used  to  generated  4.0GHz  frequency  for  modulator,  it  is  configured  by  FPGA  board 
through  an  SPI  port,  whenever  the  FPGA  board  is  powered  on,  the  Local  Oscillator  is  configured  automatically  to 
work  at  4.0  GHz.  As  the  RF  picture  shows,  there  is  a  DB9  cable  which  connect  the  Local  Oscillator  and  FPGA. 
Quadrature  modulator  TRF3703-15  is  used  to  convert  complex  modulated  signals  from  baseband  directly  up  to  RF. 


2.2  Receiver 

The  receiver  design  is  based  on  energy  (square  law)  detection  reception  technology,  the  architecture  is  shown  as 
Fig.  2.3  and  the  actual  picture  is  as  shown  as  Fig.  2.4.  The  MAXI 08,  an  8  bits,l  .5  GHz  flash  ADC,  is  employed  to 
convert  the  analog  signal  into  digital  signal  in  the  baseband,  the  digital  back-end  is  powered  by  newly  transplanting 
Virtex-5  LXT  development  board. 


Figure  2.3:  The  architecture  of  the  testbed  receiver 

All  algorithms  and  signal  processing  tasks  are  described  in  Verilog  codes  and  implemented  in  Xilinx  Virtex  family 
FPGA.  Module-based  coding  style  is  adopted  to  expedite  the  overall  development  [1].  For  our  current  testbed,  an 
advanced  Xilinx  Virtex-5  FPGA  board  supporting  high  speed  connection  has  replaced  the  old  Virtex-5  board  at 
transmitter  side,  and  the  receiver  becomes  much  more  powerful  after  replacing  the  Virtex-2  pro  FPGA  board  by  a 
Virtex-5  FPGA  board.  Among  various  features  provided  by  Xilinx  Virtex-5  FPGA  family,  its  high  speed  capability 
for  signals,  clocks  and  I/O,  as  well  as  embedded  IP  cores,  are  extremely  important  to  implementation  of  wide 
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Figure  2.4:  The  picture  of  the  testbed  receiver 


bandwidth  systems. 

Table  2.2  shows  the  FPGA  resource  usage  on  Xilinx  Vitex-5  LXT  FPGAs.  Note  that  the  FPGA  chips  used  at  the 
transmitter  and  the  receiver  are  of  the  same  model,  but  the  transmitter  FPGA  chip  has  1 136  I/O  ports  while  the 
receiver  FPGA  chip  only  has  665  I/O  ports. 


Table  2.2:  FPGA  Implementation  Statistics  at  Transmitter  and  Receiver 


Resources 

Transmitter 
Amount  used 

Transmitter 
Percent  used 

Receiver 

Amount  used 

Receiver 
Percent  used 

Number  of  Slice  Registers 

5314 

18% 

4184 

14% 

Number  of  Slice  LUTs 

5019 

13% 

3088 

10% 

Number  of  occupied  Slices 

1719 

24% 

1713 

23% 

Number  of  bonded  lOBs 

40 

8% 

90 

25% 

Number  of  BlockRAM/FIFO 

2 

3% 

1 

1% 

Number  of  BUFG/BUFGCTRLs 

7 

21% 

10 

31% 

Number  of  DSP48Es 

NA 

NA 

32 

66% 

Total  equivalent  gate  count 

275,214 

NA 

195438 

NA 
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Part  II 

Achievements  and  Findings 
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Chapter  3 


Development  Aspects 


3.1  UWB  Time  Reversal:  From  Theory  to  Practice 

3.1.1  Background 


Stimulated  by  the  FCC’s  move  that  allows  UWB  waveforms  to  overlay  over  other  systems’,  UWB  radio  has  received 
significant  attention  recently  [2-13].  Mainly  due  to  potentially  low  implementation  complexity,  suboptimal  reception 
strategies,  such  as  transmitted  reference  (TR)  [4]  [8]  ,  autocorrelation  demodulation  (ACD)  [7]  [9]  [14]  and  energy 
detection  [10, 15] ,  have  gotten  increasing  attention  for  complexity  and  cost  constraint  UWB  applications.  However, 
these  systems  suffer  performance  loss  in  rich  multipath  environments.  The  UWB  channel  impulse  response  (CIR) 
contains  a  large  number  of  resolvable  components  coming  through  different  paths,  especially  in  indoor  environments. 
Our  emphasis  has  been  on  making  good  use  of  these  signal  components. 

A  signal  focusing  technique  called  time  reversal  that  can  turn  multipath  into  benefit  and  shift  part  of  receiver  com¬ 
plexity  burden  to  the  transmitter  side  [15,1 6].  Time  Reversal  (TiR)  is  a  technology  originated  from  underwater  acous¬ 
tic  and  ultrasound  communications  [16],  and  it  has  been  extended  to  wireless  applications  recently  [12, 13, 17, 18]. 
Given  specific  time  and  location,  TiR  precoding  has  been  mathematically  proved  to  be  the  optimum  in  the  sense 
that  it  maximizes  the  amplitude  of  the  field  at  that  time  and  location  [17].  It  is  then  called  spatio-temporal  matched 
filter  [19]  because  it  is  analogous  to  a  matched  filter  both  in  time  and  space.  It  is  also  called  transmit  matched  filter 
since  the  matched  filter  is  placed  at  the  transmitter  side.  The  key  element  in  a  time  reversal  system  is  a  transmitter- 
side  filter  that  pre-filters  the  signal,  leading  to  a  condensed  equivalent  CIR.  Two  characteristics  of  time  reversal  are 
temporal  focusing  and  spatial  focusing.  Temporal  focusing  can  soften  the  impact  of  ISI.  Time  reversal  combined 
with  antenna  array  results  in  spatial  focusing  that  can  focus  energy  at  a  desired  location,  enabling  a  number  of 
unique  functions.  A  straight-forward  thought  is  “spatial-division”  multiple  access.  We  can  also  take  advantage  of 
this  spatial  discrimination  ability  to  reduce  leakage  of  signal  sent  to  an  intended  user,  or  prevent  the  signal  from  be¬ 
ing  detected  or  intercepted  at  other  locations,  which  is  equivalent  to  encryption  using  the  CIR,  resulting  in  additional 
location-based  security  enhancement  from  physical  layer  [18,20]. 

While  promising,  applying  UWB  time  reversal  at  present  is  extremely  challenging.  The  main  difficulties  come 
from  implementing  the  pre-filter  in  the  case  of  such  high  bandwidth.  It  is  desired  for  the  pre-filter  to  accurately 
represent  the  time-reversed  waveform.  However,  high-fidelity  signal  representation  needs  high  sampling  rate  and 
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high  resolution  in  magnitude,  implying  a  prohibitively  expensive  solution.  In  this  work  a  digital  FIR  filter  followed 
by  an  interpolator  is  considered  as  the  pre-filter.  We  found  that  the  requirements  for  the  FIR  filter’s  tap  spacing, 
coefficient  resolution  and  filter  length  can  be  significantly  reduced,  while  still  providing  good  focusing  property. 


3.1.2  Waveform  Preprocessing 

Modem  communication  systems  rely  on  analogue  and  digital  signal  processing  to  combat  various  impairments  such 
as  noise,  channel  fading  (flat  and  frequency-selective  fading),  and  interferences.  For  a  given  environment,  a  global 
optimum  system  under  some  sort  of  criterion  delivers  the  best  performance.  By  “global”  we  mean  joint  transmitter- 
receiver  optimization.  However,  most  existing  work  on  the  joint  transmitter-receiver  optimization  is  at  symbol-rate 
level  [21-26].  It  can  be  expected  that  by  breaking  the  symbol-rate  constraint  and  allowing  continuous-time  or 
fractional-symbol-level  processing,  the  performance  can  be  improved  further.  In  other  words,  given  a  communi¬ 
cation  channel,  we  prefer  to  design  a  set  of  transmit  waveforms  as  well  as  a  receiver  with  proper  structure  and 
algorithm,  such  that  the  system  achieves  some  sort  of  optimum  under  some  conditions  and  constraints.  The  sig¬ 
nal  processing  goals  could  be  capacity-reaching,  maximum  SNR,  and  minimum  mean  square  error  (MMSE),  etc. 
In  most  wireless  communication  scenarios  the  channel  characteristics  are  time-varying  and  up-to-date  channel  in¬ 
formation  is  only  available  at  the  receivers.  This  is  why  traditionally  signal  processing  efforts  are  focused  on  the 
receiver  side. 

It  is  reasonable  to  consider  single-carrier  pulse-based  radio  links  for  wireless  sensor  network  (WSN)  applications, 
since  they  have  potential  to  be  of  low-complexity  if  major  receiver-side  linear  processing  functions  are  shifted  to  the 
transmitter  side.  It  is  well  known  that  low  probability  of  interception  and  low  probability  of  detection  (LPI/LPD) 
can  be  easily  achieved  using  pulse-based  signaling.  Also,  a  narrow-pulse-based  single-carrier  system  can  be  used 
for  ranging  or  penetration  radar  purposes. 

A  unique  issue  associated  with  wireless  communications  is  frequency-selective  fading  caused  by  multipath  propa¬ 
gation.  As  for  interferences  in  radio  systems,  ISI  and  inter-user  interference  (lUI)  are  typically  concerned.  Listed 
below  are  some  receiver-based  schemes  to  handle  noise,  multipath  impact  and  interferences. 

•  Matched  filter  (MF):  it  is  placed  at  the  receiver  and  matches  the  given  transmit  symbol  (or  chip)  waveform; 
for  AWGN  noise,  it  maximizes  the  SNR  at  the  peak  of  the  MF’s  output. 

•  RAKE  receiver:  a  family  of  technique  to  collect  the  signal  energy  dispersed  over  multipath  components;  an 
ideal  RAKE  receiver  actually  matches  the  overall  received  waveform;  when  major  paths  are  resolvable,  a 
practical  RAKE  receiver  structure  can  be  a  combination  of  a  regular  chip-level  MF  and  a  multipath  com¬ 
biner  functioning  as  a  finite  impulse  response  (FIR)  filter;  a  RAKE  receiver  operates  at  fractional-symbol  (or 
fractional-chip)  rate. 

•  Equalization:  a  symbol-rate  discrete-time  processing  to  remove  or  reduce  ISI;  an  equalizer  is  placed  at  the 
receiver,  usually  taking  symbol-rate  samples  from  the  MF’s  output  as  its  input. 

•  (Receiver-based)  multi-user  detection:  this  is  a  broad  range  of  techniques  that  remove  or  reduce  lUI;  the 
multi-user  detection  operates  at  symbol-rate. 
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When  channel  and/or  multi-user  information  is  available  at  the  transmitter,  it  seems  no  strong  reason  not  to  use  it. 
Recently,  transmitter  based  signal  processing,  or  preprocessing,  has  received  increasing  attention.  Preprocessing  can 
be  time-reversal  pre-filtering  (or  pre-RAKE),  pre-equalization,  multi-user  precoding,  or  some  processing  for  a  set 
of  compromised  goals.  Preprocessing  can  be  used  along  in  a  transmitter-centric  system  where  traditional  receiver- 
based  signal  processing  is  shifted  to  the  transmitters  to  simplify  the  receivers.  It  can  also  work  with  receiver-based 
signal  processing  to  achieve  joint  transmitter-receiver  optimization. 

Our  special  interest  is  on  preprocessing  at  fractional-symbol  level  (or  simply,  waveform  preprocessing),  in  conjunc¬ 
tion  with  a  suboptimal  receiver.  Below  are  two  examples  of  symbol  waveforms. 


•  Pulse  shaping:  it  uses  a  transmitter-side  filter  to  create  a  transmit  waveform  that  has  desired  roll-off  spectrum; 
traditional  pulse  shapes  include  raised  cosine  and  truncated  sine  functions,  etc. 


•  Maximum-SNR  transmit  waveform:  a  transmit  symbol  waveform  to  achieve  the  maximum  SNR  at  the  MF’s 
output;  it  is  an  eigenfunction  of  the  CIR  autocorrelation  (implying  the  CIR  has  to  be  known  first);  and  a  ho¬ 
mogeneous  Fredholm  integral  equation  needs  to  be  be  solved  for  the  eigenfunction  with  the  strongest  channel 
gain  [27,28]. 


The  transmit  waveform  optimization  problem  can  be  further  stated  in  details  as  follows.  It  is  well  known  that 
the  optimum  receiver  matches  the  whole  symbol  waveform  distorted  by  the  channel,  not  the  transmitted  symbol 
waveform.  However,  from  system  optimization  point  of  view,  such  a  waveform  matching  alone  is  not  enough.  We 
can  further  maximize  SNR  at  the  receiver  by  carefully  designing  the  transmitted  waveform  [27,28]. 

Given  the  channel  impulse  response  h(t)  and  fixed  transmitted  power  Ft,  we  wish  to  achieve  the  maximum  SNR 
at  the  receive  by  jointly  designing  the  transmitted  waveform  and  a  good  receiver.  This  problem  has  been  discussed 
in  [27]  for  communication  over  troposcatter  channels  and  in  [28]  for  radar  detection. 


Assuming  the  transmitted  pulse  p(t)  (to  be  optimized)  is  confined  to  the  symmetric  time  interval  [— T/2,  T/2].  The 
energy  of  transmitted  pulse  is  then 


Ejy  = 


(3.1) 


It  follows  from  detection  theory  that  the  best  receiver  is  still  a  MF  matched  to  the  received  waveform  p{t)  *  /i(f), 
where  h{t)  is  the  CIR  and  denotes  convolution  operation.  The  (maximum)  SNR  at  the  output  of  such  an  MF  is 
given  by 

SNR  =  2Ey//Vo,  (3.2) 

where  Ey  =  *  ^(01^  is  the  received  signal  energy.  The  problem  is  then  reduced  to  find  the  optimum 

p{t)  such  that  Ey  is  maximized,  under  the  constraint  of  fixed  Ep, 


It  has  been  shown  in  [29]  (p.  125)  and  [28]  that  the  optimum  p{t)  can  be  obtained  by  solving  the  following  homoge¬ 
neous  Fredholm  integral  equation 


fTl2 

Pn(t>n{t)=  /  K{t  -  T)(})nir)  dr  , 

J-TI2 


(3.3) 


and  letp(f)  =  where  is  the  eigenfunctions  corresponding  to  the  maximum  eigenvalue  and  the  kernel 
K{t)  is  the  autocorrelation  of  the  CIR:  K{t)  =  h{t)  *  h{—t).  When  convolved  with  the  kernel  over  the  interval 
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Figure  3.1:  A  conceptual  transmitter  structure  capable  of  preprocessing  (in  the  RP  front-end  frequency  up-  and 
down-conversion  may  be  used  to  shift  the  signal  frequency). 


[— T/2,T/2],  pulse  waveform  (f>o(t)  reproduces  itself,  scaled  by  a  constant  fiQ.  With  optimum  p(t)  =  we 
achieve  the  maximum  SNR 


SNR  =  2poEp/No. 


(3.4) 


It  is  worth  noting  that  this  maximum-SNR  waveform  may  lead  to  severe  ISI  if  the  duration  of  p(t)  =  exceeds 
the  symbol  duration,  which  is  one  reason  preventing  the  scheme  from  being  widely  applicable. 

A  transmitter  capable  of  general  waveform  synthesizing  or  arbitrary  waveform  generating  would  be  the  ultimate 
goal  of  preprocessing.  A  digital  FIR  filter  based  waveform  generator  would  be  flexible  and  become  more  and  more 
feasible  as  the  semiconductor  technology  advances.  A  common  structure  of  this  type  of  waveform  generators  is 
a  digital  FIR  filter  followed  by  an  analogue  interpolating  or  shaping  filter  as  illustrated  in  Fig.3.1.  However,  for 
any  digital  implementation,  sampling  rate  as  well  as  quantization  resolution  are  limited.  From  a  perspective  of 
implementation,  trade-off  between  performance  and  feasibility  or  cost  must  be  made.  Time  reversal  with  mono-bit 
or  ternary  quantization  and  sub-Nyquist  rate  sampling  has  been  proved  working  satisfactorily,  according  to  computer 
simulation  and  test-bed  based  experiment  [20,30].  The  practical  limitations  should  be  considered  in  the  performance 
optimization. 


3.1.3  Energy  Detector  Based  Receivers 

Energy  detection  is  a  non-coherent  detection  technique  that  can  be  combined  with  many  modulation  schemes  such 
as  OOK,  pulse  position  modulation  (PPM)  and  frequency  shift  keying  (FSK).  Shown  in  Fig.3.2  is  a  conceptual 
architecture  of  energy  detector  based  receiver.  The  advantages  of  energy  detection  include  no  need  for  channel 
estimation  and  higher  energy  efficiency  than  traditional  TR.  The  energy  detector  can  be  implemented  alternatively 
by  a  diode  device  working  at  square-law  region,  and  an  energy  detector  with  OOK  modulation  can  be  of  very  low 
cost.  Multiband  energy  detection  receiver  is  a  reasonable  extension  to  support  higher  data  rate  and  reject  narrow- 
band  interference  (with  filter  bank). 

Synchronization  and  thresholding  are  challenging  issues  with  OOK  energy  detector  receiver.  The  difficulty  of  initial 
timing  acquisition  is  mainly  due  to  not  being  able  to  use  good  PN  codes  with  sharp  autocorrelation,  and  multipath 
distortion  makes  the  situation  worse.  The  Optimal  decision  threshold  for  OOK  and  energy  detector  may  be  found 
theoretically  [31].  In  reality  a  real-time  threshold  must  be  available  at  the  receiver,  which  motivates  searching  for 
adaptive-  threshold  strategies  and  effective  algorithms. 

Similar  to  TR  receivers,  gating  or  weighting  the  incoming  signal  has  potential  to  improve  the  performance.  To  have 
the  feature  mentioned  above,  monitoring  the  signal  strength  in  real  time  is  a  must,  and  the  weighting  method  can  add 
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too  much  computational  complexity.  In  other  words,  implementing  this  feature  can  make  the  receiver  less  attractive 
because  of  increase  of  complexity. 

Some  research  on  performance  evaluation  of  OOK  energy  detector  scheme  has  been  done  recently  [15].  The  well- 
known  Park’s  model  that  deals  with  Chi-square  distribution  approximately  has  been  applied  to  OOK  energy  detector 
to  obtain  BER  closed  form  considering  ISI.  The  decision  threshold  is  determined  using  the  worst  cases  of  ISI 
patterns. 


Figure  3.2:  Energy  detector  receiver. 


3.1.4  Prototyping 

The  main  goal  is  to  build  a  pair  of  concept-proof  transmitter  and  receiver  to  test  and  verify  various  schemes.  The 
testbed  is  expected  to  be  flexible  enough  to  accommodate  several  major  transmission  and  reception  techniques.  The 
strategy  is  to  develop  the  testbed  based  on  our  latest  research  work  and  use  commercially  available  off-the-shelf 
components  to  expedite  the  project. 

Module-based  implementation  methodology  is  adopted  mainly  in  implementing  the  RF  front-end  to  expedite  the 
overall  development.  The  pulse  generator  is  based  on  gated  oscillation  principle  that  can  be  viewed  as  multiplying 
a  digital  pulse  with  an  oscillation  source.  The  major  advantage  of  this  pulse  generator  is  high  controllability  of 
frequency  band  and  spectral  shape.  The  baseband  part  is  implemented  using  FPGA  and  the  signal  processing  is  fully 
in  digital  domain,  providing  the  test-bed  with  flexibility  and  programmability.  A  digital  to  analog  converter  (DAC) 
being  able  to  offer  a  sampling  rate  at  Giga  samples  per  second  (Gsps)  is  employed  to  generate  desired  waveform.  The 
precoder  challenge  thus  converts  to  the  difficulty  in  mixed-signal  circuitry,  high-speed  FIR  filter  and  the  interfaces 
between  them.  Connection  bus  between  the  D/A  and  FPGA  baseband  part  is  one  of  bottlenecks,  and  potential 
solutions  include  use  of  a  bank  of  short  coaxial  cables.  Even  choosing  the  latest  FPGA  products,  implementing  the 
precoding  and  synchronization  functions  is  extremely  challenging. 

The  work  of  radio  system  design  is  to  provide  an  efficient  solution  under  some  theoretical  and  practical  criteria. 
A  top-down  design  flow  covers  many  aspects  ranging  from  a  very  high  level  design  to  detailed  implementations. 
A  number  of  issues  need  to  be  considered:  frequency  band,  architecture,  data  rate,  modulation,  synchronization, 
coexistence,  interference,  dynamic  range,  and  many  implementation  issues.  As  for  the  system  with  preprocessing 
for  wideband  applications  like  UWB,  one  of  the  major  challenges  is  in  the  waveform  generator.  Specifically,  it  is 
about  how  to  choose  a  proper  sampling  rate  as  well  as  quantization  resolution,  and  how  to  efficiently  implement 
the  algorithm.  Based  on  our  indoor  experiments  in  UWB  band,  a  reduced-complexity  pre-filter  can  function  very 
well  and  implementing  time-reversal  preprocessing  for  microwave-band  communications  is  feasible  [1].  A  reference 
example  of  time  reversal  radio  test-bed  is  presented  in  the  following. 

We  adopt  pulse  based  signaling  and  transmitter-side  processing  as  system  design  guideline.  Although  direct  pulse 
(carrierless)  transmission  can  largely  reduce  complexity  of  transmitter  RF  front-end,  it  is  not  a  good  choice  for 
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multi-purpose  radio  testbed  mainly  because  of  its  inflexibility.  As  a  matter  of  fact,  a  modulated  pulse  is  not  only 
easy  to  generate  but  also  more  flexible:  the  center  frequency  is  controlled  by  a  local  oscillator  and  the  spectral  shape 
is  governed  by  the  baseband  pulse.  Conceptual  testbed  architecture  is  shown  in  Fig.2.1,  where  all  baseband  and 
control  functions  are  implemented  using  FPGAs.  Following  an  FIR  filter  (implemented  in  the  FPGA),  the  DAC 
outputs  desired  analog  waveforms.  The  simple  receiver  philosophy  is  reflected  in  this  testbed  with  OOK  or  2-PPM 
and  diode  based  non-coherent  detector  at  the  receiver.  At  the  receiver,  demodulation  is  done  in  digital  domain,  so 
that  algorithms  and  parameters  can  be  adjusted  easily. 


3.2  Synchronization  Based  on  Energy  Detector 

For  a  burst  mode  radio  system,  the  synchronization  has  always  been  a  challenge.  In  order  to  make  the  synchro¬ 
nization  more  robust  for  the  testbed,  we  have  made  some  improvements,  which  include  modification  of  the  frame 
structure,  average  samples  for  start  pattern  searching  and  various  thresholds  adjustment. 


3.2.1  Synchronization  Review 

For  UWB  impulse  radios  signal  initial  acquisition  is  extremely  difficult  because  of  the  very  narrow  pulses  with  ultra 
low  power  and  low  duty  cycle.  In  the  testbed,  timing  requirement  is  relaxed  to  the  symbol  level.  Since  energy 
detection  employed  in  the  testbed  is  not  able  to  identify  signal  polarity,  the  initial  acquisition  has  to  rely  on  a  uni¬ 
polar  sequence  whose  autocorrelation  is  typically  less  sharp  than  that  of  a  bi-polar  sequence.  Optical  orthogonal 
code  (OOC)  is  used  in  the  testbed  for  synchronization.  The  OOC  codes  can  be  more  longer  than  the  Barker  code  and 
exhibit  better  autocorrelation  property,  which  is  desirable  for  severe  propagation  cases.  Two  stage  synchronization 
strategy  is  employed  in  the  testbed.  Tracking  is  not  considered  while  the  initial  timing  acquisition  is  implemented  in 
the  testbed. 

The  overall  receiver  functional  diagram  is  shown  as  Fig.  3.3.  The  main  modules  include  data  interface  between 
FPGA  and  ADC,  integration,  synchronization,  decision,  a  finite  state  machine  and  thresholds  control.  Among  them, 
the  challenges  at  the  receiver  are  mainly  on  the  high  speed  interface,  fast  integration  and  the  robust  synchronization. 
We  are  currently  using  a  3  stage  synchronization  technique  which  is  shown  in  Fig.  3.5. 

Fig,  3.4  is  the  data  flow  for  the  receiver.  First  the  interface  module  converts  the  high  speed  ADC  output  data  to 
relatively  low  speed  data  streams  through  several  steps.  Then  the  data  streams  go  to  start  pattern  searching  module 
to  find  the  time  of  arrival  and  go  to  integration  module  for  chip  level  energy  integration.  There  are  32  fast  integrators 
for  integration,  which  are  implemented  by  DSP  cores  embedded  in  Virtex-5  FPGA.  The  synchronization  module 
combines  the  time  of  arrival  signal  and  the  integration  results  by  finding  the  integrator  with  maximum  energy,  then 
it  gets  the  chip  threshold.  After  that  the  receiver  applies  frame  level  synchronization  to  find  the  fine  timing  and  make 
decisions.  The  finite  state  machine  and  variable  thresholds  control  module  coordinate  all  the  processes. 

3.2.2  Frame  Structure 

The  system  is  supposed  to  find  the  window  with  maximum  energy  in  every  40  ns,  which  is  the  chip  period.  Fig, 
3.5  shows  the  new  frame  structure  of  the  system.  The  first  stage  is  to  search  the  time  of  arrival;  the  second  stage 
is  to  average  128  chips  energy  to  find  the  chip  threshold;  the  third  stage  is  to  synchronize  the  whole  frame.  Data 


3.2.  SYNCHRONIZATION  BASED  ON  ENERGY  DETECTOR 


19 


Figure  3.3:  Receiver  functional  diagram 
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Figure  3.4:  Receiver  data  flow 
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demodulation  is  performed  after  the  three-stage  synchronization.  Fig.  3.5  also  shows  the  sliding  window  integration 
diagram  and  the  state  transition  diagram. 
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Figure  3.5:  The  new  frame  structure  design 

The  modification  in  the  synchronization  head  does  not  change  the  overall  frame  structure.  The  following  parameters 
are  kept:  system’s  frame  length:  4096  chips  or  163.84  /is;  payload  efficiency:  94.49%;  chip  sync  accuracy:  one 
sample  (1.25  ns);  and  frame  sync  accuracy:  one  chip  (40  ns).  If  the  overall  clock  accuracy  is  20  PPM,  then  the 
maximum  time  drift  is  163.84  x  10“^  =  3.276  /iS. 


3.2.3  Dense  Pulses 


However,  even  though  we  averaged  the  start  pattern  searching,  we  observed  that  the  synchronization  error  still 
mainly  comes  from  the  first  stage.  In  this  stage,  the  threshold  is  very  sensitive  to  noise  level  and  DC  offset  of  ADC 
sampling,  especially  for  longer  distance  transmission.  So  we  try  to  send  5  chips  long  dense  pulses  at  the  beginning 
of  each  frame  to  make  sure  the  time  of  arrival  is  found  each  frame  since  the  average  value  of  these  chips  should  be 
much  higher  than  ordinary  ones.  Fig.  3.6  shows  the  real  picture  of  dense  pulses  at  the  beginning  of  each  frame. 


3.2.4  Start  Pattern  Averaging 

For  the  first  synchronization  stage,  we  accumulate  the  values  of  32  consecutive  samples  and  compare  it  with  a 
threshold  to  claim  whether  it  is  one  frame’s  time  of  arrival.  However,  due  to  the  interference  and  environment’s 
uncertainty,  the  first  stage  result  should  not  be  so  reliable  when  it  just  counts  on  the  value  only  one  chip  time, 
because  the  time  frame  is  too  short.  So  we  try  to  average  two  chip  time  samples  result  to  get  the  averaged  value,  and 
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Figure  3.6:  The  picture  of  dense  pulse  at  the  beginning  of  each  frame 


compare  it  with  a  threshold,  Fig.  3.7  shows  the  diagram  of  this  improvement,  where  we  set  a  more  strict  condition  to 
claim  the  time  of  arrival  result  by  comparing  5  times(or  other  number)  instead  of  only  one,  and  is  in  5  chips  time(or 
other  number),  3  high  results  are  needed  to  claim  the  first  stage  synchronization  result.  After  that,  because  we  send 
dense  pulses  at  the  beginning  of  each  frame  and  there  is  a  interval  between  the  pulses  and  the  real  chips,  we  delay 
two  chips  time  before  the  beginning  of  second  stage. 


3.2.5  Thresholds  Adjustment 

For  energy  detection  based  receiver,  it  is  well  known  that  the  performance  depends  significantly  on  the  choice 
of  threshold  level.  A  good  threshold  can  be  determined  by  using  some  channel  quality  indicator  and  feedback 
information  provided  by  the  digital  processor  (back-end)  at  the  receiver.  There  is  an  optimum  threshold  for  lowest 
error  rate.  To  deal  with  ISI  situation,  it  is  proposed  to  set  a  threshold  based  on  two  worst  signal  cases. 

One  phenomenon  discovered  is  that  a  mono-bit  ADC  along  with  proper  thresholding  can  be  applied  for  quantizing 
the  pre-filters  coefficients,  and  the  resulting  signal  still  gets  focused.  It  enables  the  development  of  a  unique  time 
reversal  system  with  much  lower  complexity.  Major  points  include:  (1)  optimization  of  the  threshold  for  quantiza¬ 
tion;  (2)  development  of  an  adaptive  threshold  method;  (3)  impact  of  tap  spacing  and  length  of  the  pre-filter.  In  our 
case,  there  are  4  variable  thresholds  at  the  receiver  side  with  each  corresponds  to  different  stages  of  synchronization 
and  detection.  They  are  the  threshold  for  start  pattern  searching,  the  threshold  for  ADC  bias  shift,  the  code  distance 
for  frame  level  synchronization  and  the  threshold  of  chip  threshold  determine  at  the  second  synchronization  stage. 
All  these  values  are  adjusted  by  the  pushbuttons  on  the  receiver  Virtex-5  board  and  the  led  lights  can  show  any 
threshold’s  adjustment  level. 

DC  offset  causes  performance  degradation  in  signal  processing  systems  especially  for  high-speed  applications.  For 
our  ADC  board  in  the  receiver  side,  the  DC  shift  makes  the  true  quantization  value  hard  to  determine,  thus  makes 
the  threshold  vulnerable,  especially  for  the  energy  detection  based  receiver  with  OOK  modulation.  DC  offset  can  be 
reduced  in  real-time  by  subtracting  the  mean  amplitude  from  each  sample,  or  by  calibrating  the  ADC  board. 


3.2.6  Other  issues  considered 

For  the  third  stage, we  also  utilized  60  bits  M  sequence  instead  of  60  bits  OOC  code  for  frame  synchronization.  This 
is  mainly  because  the  randomness  of  M  sequence  is  much  better  than  OOC  code. 

For  time  reversal  system  and  non  time  reversal  system,  the  received  pulses  are  very  different  at  the  width,  typically 
6  ns  for  time  reversal  and  1 5  ns  for  non  time  reversal.  So  in  the  energy  integration  module,  the  window  size  for 
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The  new  start  pattern  searching 


Figure  3.7:  Start  pattern  searehing  improvement 
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integration  should  be  different  for  different  signal  scenarios,  this  parameter  is  to  be  adjusted  in  integration  module. 


3.3  Arbitrary  Waveform  Generator 


Arbitrary  waveform  generator  (AWG)  is  one  of  the  essential  achievements  in  this  project.  AWG  gives  a  way  to 
implement  a  variety  of  waveforms  for  different  kinds  of  systems,  i.e.  communication  system,  radar  system  or 
imaging  system,  which  makes  waveform  diversity  real  and  practical.  Generally,  AWG  consists  of  RF  part,  high 
sampling  rate  digital  to  analog  converter  and  digital  AWG. 

In  this  project,  digital  AWG  is  the  core  part  in  the  backend  at  the  transmitter.  Digital  AWG  can  generate  virtually 
any  type  of  transmitted  waveforms  according  to  the  design  objectives,  which  greatly  increases  the  capability  of 
UWB  communication  system.  The  performance  parameters  of  current  digital  AWG  is  summarized  as  follows.  The 
quantization  resolution  is  8  bits,  which  is  enough  for  most  of  the  real  waveforms.  The  sampling  rate  of  output  digital 
data  is  1  GHz  which  is  much  faster  than  any  commercial  systems.  The  number  of  channels  is  2.  Thus  I/Q  data  can 
be  generated  simultaneously.  The  number  of  waveform  coefficients  is  160.  Data  memory  is  added  into  digital  AWG. 
Data  memory  plus  8  groups  of  shift  registers  are  used  to  store  the  waveform  coefficients.  In  this  way,  the  loading  of 
waveform  coefficients  into  FPGA  is  much  easier.  Boolean  operation  and  mathematical  computation  are  used,  which 
can  support  the  calculation  of  input  data  with  higher  quantization  resolution. 

How  can  we  use  FPGA  with  550MHz  maximum  clock  rate  to  generate  the  output  digital  data  with  IGHz  sampling 
rate?  Parallel  to  Serial  Converter  gives  us  a  hope.  The  overall  structure  of  digital  AWG  for  one  channel  is  shown 
in  Figure  3.8.  Waveform  coefficients  are  stored  in  Data  Memory.  When  FPGA  powers  on,  waveform  coefficients 
can  be  automatically  loaded  from  Data  Memory  to  the  corresponding  group  of  shift  registers.  There  are  8  groups  of 
shift  registers.  And  each  group  of  shift  registers  corresponds  to  one  Process  Module.  The  clock  rate  of  each  group 
of  shift  registers  plus  the  corresponding  Process  Module  is  125  MHz.  Finally  Parallel  to  Serial  Converter  is  used  to 
generate  the  output  digital  data  with  1  GHz  sampling  rate. 

For  each  channel,  we  have  8  groups  of  shift  registers  and  each  group  can  store  20  waveform  coefficients  shown  in 
Figure  3.9.  So  totally  we  can  have  160  waveform  coefficients  for  each  channel. 

Now  the  multiplication  and  addition  shown  in  Figure  3.10  arc  exploited  to  perform  the  calculation.  In  this  way,  the 
Vcrilog  code  is  more  compact  and  readable.  The  calculation  of  input  data  with  higher  quantization  resolution  can  be 
supported. 

Format  Regulation  uses  Boolean  operation  to  generate  complementary  code  for  the  following  processing  in  Process 
Module  according  to  the  scrambling  code  generated  by  Scrambling  Code  Generator  and  the  value  of  input  chip. 

The  key  module  in  digital  AWG  is  the  high  speed  Parallel  to  Serial  Converter,  the  structure  of  which  is  shown  in  Fig¬ 
ure  3.1 1.  Through  this  way,  the  high  speed  requirement  of  the  output  digital  data  is  relaxed  by  parallel  computation 
and  Parallel  to  Serial  Converter.  The  function  of  parallel  to  serial  conversion  is  implemented  by  OSERDES  logical 
resources  in  Virtex-5  FPGA.  Each  OSERDES  logical  component  can  support  6  to  1  parallel  to  serial  conversion,  but 
we  need  8  to  1  parallel  to  serial  conversion,  so  two  components  are  used  to  build  one  converter.  One  component  is 
called  master  and  one  other  is  slave.  The  function  diagram  is  shown  in  Figure  3.12.  In  our  design,  the  sampling  rate 
of  input  data  from  8  parallel  branches  is  1 25  MHz  and  sampling  rate  of  the  output  digital  data  is  1  GHz. 
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Figure  3.8:  Overall  structure  of  digital  AWG  for  one  channel. 

3.4  High  Speed  FPGA-DAC  Interface 


High  speed  FPGA-DAC  interface  is  challenging.  Our  Fujitsu  MB86064  DAC  supports  dual  channel  14-bit  1  GHz 
sampling  rate.  However,  the  earlier  version  of  our  testbed  only  reached  500  Msps  with  2-bit  resolution.  This  is 
far  below  the  capability  of  the  DAC.  This  limited  performance  is  due  to  the  interface  between  FPGA  evaluation 
board  (EB)  and  DAC  development  kit  (DK).  Since  the  FPGA  EB  of  our  earlier  testbed  version  is  not  designed  for 
high-speed  connection,  synchronized  pins  arc  rare  when  data  rate  is  over  500  Mbps.  In  the  final  testbed  platform,  we 
used  a  new  FPGA  EB,  ML550,  which  is  designed  for  high-speed  data  transmission.  Here  we  will  introduce  ML550 
and  the  connection  between  it  and  DAC  DK. 

The  ML550  Networking  Interfaces  Platform  has  an  XC5VLX50T-FFG1 136  FPGA  on  chip.  The  key  features  are: 

•  64M  X  8  DDR  SDRAM  memory 

•  Eight  clock  sources:  -  200  MHz,  250  MHz,  133  MHz,  and  33  MHz  on-board  oscillators  -  Two  ICS8442  clock 
synthesizer  devices 
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Figure  3.9:  Digram  of  shift  registers. 

•  One  USB  “B”  port 

•  One  64  X  128  pixel  LCD 

•  A  System  ACE  CompactFlash  (CF)  Configuration  Controller  that  allows  storing  Six  Samtec  LVDS  connectors 
(a  total  of  53  differential  input  and  53  differential  output 

•  Onboard  power  regulators  with  5%  output  margin  test  capabilities,  in  2.5%  Power  monitor  connector  for 
detailed  current  measurements  on  Vccint,  Vccaux,  and  Vcco  supplies 

The  clock  and  interfacing  components  on  ML550  are  very  powerful.  The  differential  SMA  clock  inputs  are  con¬ 
nected  to  the  global  clock  inputs  of  the  FPGA.  An  onboard  200-MHz  oscillator  calibrates  the  I/O  delay,  and  an 
onboard  250-MHz  oscillator  is  provided  for  general  use.  The  two  ICS8442  clock  synthesizer  devices  output  dif¬ 
ferential  LVDS  clocks  in  the  31 .25  MHz  to  700  MHz  range,  which  can  drive  LVDS  data  rate  up  to  1.4  Gbps.  The 
ML550  provides  53  pairs  of  transmit  signals  and  53  pairs  of  receive  LVDS  signals.  These  signals  are  distributed 
across  three  Samtec  QSE-DP  connectors  for  transmitting  and  another  three  connectors  for  receiving.  The  number  of 
LVDS  output  and  the  speed  of  the  LVDS  data  rate  can  maximize  the  DAC’s  capability  to  dual  channel  14-bit  1  Gsps 
output. 

Fujitsu  and  Xilinx  developed  an  application  to  drive  the  DAC  DK  with  ML550.  An  interface  adapter  is  built  by 
Fujitsu  (Fig.  3.13).  This  adapter  can  be  directly  plugged  onto  the  DAC  DK  0.1”  pitch  header.  Two  SAMTEC  EQCD 
high-speed  ribbon  cables  are  used  to  connect  the  adapter  and  ML550.  The  complete  interface  setup  is  illustrated  in 
Fig.  3.14.  The  ribbon  cable  supports  data  rate  up  to  2.84  Gbps.  The  pin  assignments,  a.k.a.  mappings  between  the 
Fujitsu  DAC  input  ports  and  FPGA  output  ports  are  tested  and  built  as  a  user  constraint  file.  It  can  be  loaded  in  the 
FPGA  design  and  drive  the  DAC  properly. 


3.4.1  Interface  Performance  Test 

Fujitsu  and  Xilinx  provided  an  application  note  to  verify  this  setup  with  MB 86065.  An  experiment  was  tested  to 
show  that  this  setup  can  drive  single-port  14-bit  1.3  Gsps  data  into  DAC.  In  our  application  however,  we  need  to 
verify  its  ability  to  drive  dual-port  14-bit  1  Gsps  data. 

The  purpose  of  the  interface  is  to  fully  utilize  the  performance  of  the  DAC,  which  is  the  dual  channel,  1  Gsps  and 
14-bit  quantization.  A  test  environment  is  built  to  test  the  performance.  All  DAC  outputs  are  driven  from  FPGA 
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Figure  3.10:  Strucuture  of  Process  Module. 


with  the  interface  solution.  FPGA  driven  waveforms  are  compared  with  the  waveform  memory  module  driven  ones, 
since  the  latter  ones  are  considered  to  be  correct.  We  use  ramp  waveform  as  the  test  waveform.  Two  identical  ramp 
waveforms  are  generated  by  FPGA  to  feed  DAC  ports  A  and  B.  They  are  both  identical  to  the  ones  generated  by 
waveform  memory  and  they  are  perfectly  synchronized.  In  another  test  with  results  shown  in  Fig.  3.15,  a  14-bit 
mono-cycle  1  ns  pulse  is  generated.  The  pulse  width  is  a  little  bit  over  1  ns  and  there  is  a  tail  behind  the  pulse.  This 
is  caused  by  the  circuit  of  the  DAC  evaluation  board  because  the  property  of  the  tail  does  not  change  according  to 
cable  lengths.  The  spectrum  of  the  mono-cycle  pulse  is  depicted  in  Fig.  3.16,  with  a  3  dB  bandwidth  around  1  GHz. 
Another  result  is  that  the  lower  9-14  bits  of  the  DAC  are  under  the  noise  level  and  can  not  be  observed.  So,  the 
configuration  of  the  DAC  becomes  dual  channel,  1  Gsps  and  8-bit  quantization,  which  is  a  great  improvement  to  the 
earlier  version  of  the  testbed. 

In  the  test,  we  have  verified  that  all  I/Os  are  synchronized.  The  maximum  data  rate  is  not  limited  by  the  DAC  and 
interface,  but  FPGA.  The  maximum  output  rate  of  FPGA  is  1 .4  GHz.  If  we  want  to  upgrade  our  DAC  to  over  1 .4 
GHz  sampling  rate,  we  need  another  solution.  In  such  case,  a  DAC  with  in-chip  parallel  to  serial  conversion  is 
necessary.  For  example,  the  MAXIM  4.3  Gsps  DAC  uses  a  4:1  MUX  to  drive  the  DAC,  and  therefore  the  sampling 
rate  of  each  of  the  4  data  channels  is  reduced  to  1 .075  GHz,  which  is  below  the  limit  of  the  ML550  I/O  interface. 
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Figure  3.11:  Strucuture  of  Parallel  to  Serial  Converter. 

3.5  Challenges  in  System  Implementation 


3.5.1  High  Speed  A/D  and  D/A  Converters 

We  are  utilizing  the  MAX  108  evaluation  board  to  perform  the  analog-to-digital  conversion  in  the  receiver  side,  the 
FPGA  board  and  the  ADC  board  are  connected  together  through  50  ft  SMA  cables,  with  a  ADC/FPGA  interface 
board  which  is  plugged  into  the  FPGA  development  board.  The  interface  solution  can  support  signals  with  frequency 
of  up  to  several  GHz. 

For  the  design  of  the  high-speed  interface  between  ADC  and  FPGA,  signal  integrity  has  become  a  critical  issue. 
Many  signal  integrity  problems  are  electromagnetic  phenomena  in  nature  and  hence  related  to  the  EMI/EMC.  There 
are  two  concerns  for  signal  integrity  -  the  timing  and  the  quality  of  the  signal.  Signal  timing  mainly  depends  on 
the  delay  caused  by  the  physical  length  that  the  signal  must  propagate.  Signal  waveform  distortions  can  be  caused 
by  reflection,  cross  talk  and  power/ground  noise.An  interface  board  must  be  carefully  designed  to  solve  the  signal 
integrity  issue. 

A  4-layer  PCB  board  is  designed  and  fabricated  to  have  50  D  characteristic  impedance  for  each  trace.  The  PCB 
layer  stack  is  shown  in  Fig.  3.17.  For  each  pair  of  LVPECL  signals,  the  traces  are  designed  to  have  same  length  such 
that  the  positive  and  negative  signals  experience  same  delay. 


3.5.2  FPGA  Implementation  Timing  Closure  Issues 

Tremendous  effort  has  been  made  in  developing  the  FPGA  based  digital  back-ends.  One  lesson  we  have  learned 
is  that  in  the  FPGA  implementation  phase  time  closure  is  very  critical  in  dealing  with  nano-second-order  signals. 
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Figure  3.12:  Strucuture  of  OSERDES  expansion. 


especially  when  the  area  usage  is  high  (say,  80%).  FPGA  technology  advances  rapidly  and  many  high-speed  dense 
FPGA  products  are  on  the  market.  We  have  been  using  the  latest  Xilinx  Virtex-5  FPGA  chips  in  the  test-bed, 
which  truely  lessens  the  in-chip  timing  problem.  However,  when  we  push  the  samping  rate  higher  and  higher,  the 
connection  between  the  FPGA  chips  and  the  DAC  or  ADC  becomes  new  bottle  neck. 

For  FPGA  implementation,  meeting  timing  requirements  in  speed-critical  designs  has  been  always  a  challenge,  and 
this  is  often  an  iterative  process  ensuring  that  each  and  every  path  in  the  design  meets  the  required  timing.  Meeting 
timing  closure  is  easy  and  automatic  for  relatively  slow  or  small  designs.  However,  most  designs  do  not  fall  into 
that  happy  category  and  as  each  critical  path  is  adjusted  in  order  to  meet  timing,  new  ones  are  uncovered  or  created 
anew.  In  the  case  of  FPGAs  implemented  at  the  65nm  technology  node,  for  example,  wire  delays  can  account  for 
80-to-90%  of  each  path’s  delay. 

An  important  technique  for  achieving  successful  timing  closure  on  an  aggressive  design  is  to  carefully  review  the 
most  critical  timing  constraints.  In  our  case,  there  are  two  critical  paths  at  the  transmitter:  one  is  from  the  data 
loading  module  to  the  waveform  generator  module  where  huge  parallel  data  steams  exist,  another  critical  path  is 
in  the  parallel  to  serial  conversion  module  requiring  high  clock  rate.  At  the  receiver  side,  the  most  critical  path 
is  from  the  integration  module  to  the  synchronization  module  which  performs  fast  timing  acquisition  in  parallel 
manner.  Achieving  clock-to-clock  (global)  timing  for  all  internal  signals  in  a  synchronous  design  may  be  easy, 
but  this  simple  approach  will  usually  overconstrain  the  design  and  eventually  lead  to  a  failure.  Therefore,  we  have 
applied  different  constraints  to  different  paths  depending  on  their  timing  requirements. 

FPGA  design  sizes  have  reached  unprecedented  levels  with  million  gate  parts  becoming  increasingly  common, 
However,  the  issue  of  timing  closure  on  the  larger  and  more  complex  FPGAs  has  become  one  of  the  more  daunting 
for  designers  to  tackle  without  the  right  set  of  tools.  The  timing  closure  problem  for  these  high-end  FPGAs  has  its 
root  in  the  increased  net  delays  in  relation  to  the  gate  delays.  The  design  size  and  interconnect  issues  together  have 
led  to  an  inefficient  iterative  methodology  in  completing  designs. 
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Figure  3.13:  Passive  interface  adapter 


Figure  3.14:  Interface  setup  connecting  ML550  and  Fujitsu  DAC  DK 
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Figure  3.15:  A  1  ns  pulse  generated  by  DAC. 


Figure  3.16:  The  speetrum  of  a  1  ns  pulse  generated  by  DAC. 
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wnal  Plane  1  (GND.POWER) 
Internal  Plane  2  (3.3V) 


Bottom  Layer 
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Figure  3.17:  New  interface  board  PCB  layer  stack 


For  the  testbed,  timing  closure  issues  exist  in  both  the  transmitter  side  and  receiver  side.  Since  transmitter  side 
utilizes  less  resources  and  the  logic  is  not  as  complex  as  the  receiver  side,  so  the  timing  closure  issue  is  relaxed. 
However,  in  the  receiver  side,  timing  closure  is  critical  for  its  large  design  size  and  complex  logics,  even  in  the  new 
Virtex-5  FPGA. 


Typically,  there  are  global  timing  constraints.  Offset  constraints,  specific  path  constraints,  groups  constraints.  In 
some  advanced  design,  there  are  area  constraints,  which  enable  partitioning  of  the  design  into  physical  regions  for 
mapping,  packing,  placement,  and  routing.  The  timing  closure  flow  can  be  shown  as  Fig  3.18.  Generally,  designs 
over  50MHz  should  use  timing  constraints,  while  for  the  testbed,  processing  rate  is  up  to  400Mhz  in  the  interface 
module.  After  applying  global  timing  constraints,  if  the  system  meets  the  timing  requirements,  then  there  is  no  need 
to  add  other  constraints.  Otherwise,  there  is  a  need  to  increase  place  and  route  effort,  if  it  still  fails  to  meets  timing, 
then  we  need  to  find  the  specific  paths  and  add  critical  path  constraints.  After  that,  we  may  need  to  run  multi-pass 
place  and  route,  further,  we  may  also  need  to  do  some  floorplan  work.  For  special  cases,  area  constraints  may  be 
employed.  If  all  these  strategies  still  can’t  meet  timing  closure  requirements,  then  the  only  solution  is  to  go  back  to 
start  over  and  rewrite  the  codes. 


Figure  3.18:  Xilinx  FPGA  timing  closure  flow 
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The  impact  of  timing  constraints  can  be  seen  in  Fig.  3.19,  where  the  left  picture  shows  the  output  waveform  from 
the  DAC  when  appropriate  timing  constraints  are  applied,  while  the  right  one  shows  the  waveform  with  unwanted 
spikes  when  timing  constraints  are  not  properly  added. 


3.6  Some  Lessons  Learned 

Tremendous  effort  has  been  made  in  developing  the  FPGA  based  digital  back-ends.  One  lesson  we  have  learned 
is  that  in  the  FPGA  implementation  phase  time  closure  is  very  critical  in  dealing  with  nano-seeond-order  signals, 
especially  when  the  area  usage  is  high  (say,  80%).  FPGA  technology  advances  rapidly  and  many  high-speed  dense 
FPGA  products  are  on  the  market.  We  have  been  using  the  latest  Xilinx  Virtex-5  FPGA  chips  in  the  test-bed, 
which  truely  lessens  the  in-ehip  timing  problem.  However,  when  we  push  the  samping  rate  higher  and  higher,  the 
connection  between  the  FPGA  chips  and  the  DAC  or  ADC  becomes  new  bottle  neck.  In  the  transmitter  side,  this 
sinal  integrity  problem  prevents  us  from  reaching  higher  clock  rate  beyond  500  MHz.  In  other  words,  to  knock  down 
this  clock  rate  barrier  we  should  seek  a  PCB-integrated  FPGA/DAC  solution. 


Chapter  4 


Theoretical  Work  and  Forward  Looking 


4.1  Waveform  Optimization 

From  theoretical  point  of  view,  one  byproduct  achievement  form  this  project  is  systematic  study  of  waveform  op¬ 
timization  for  wideband  communication  systems.  Based  on  the  transceiver  scheme,  radio  environment  and  quality 
of  service  (QoS)  requirement,  the  transmitted  wideband  waveform  can  be  designed  and  optimized  elaborately.  This 
means  the  waveform  can  be  diverse  and  adaptive,  far  beyond  the  traditional  fixed  Gaussian  pulse  or  the  time  reversed 
channel  impulse  response. 

4.1.1  Wideband  Waveform  Optimization  for  Energy  Detector  Receiver  with  Practical  Considera¬ 
tions 

System  Description  and  Optimal  Waveform 

The  system  architecture  is  shown  in  Figure  4.1.  Wc  limit  our  discussion  to  a  single-user  scenario,  and  consider  the 
transmitted  signal  with  OOK  modulation  given  by 

oo 

s{t)=  djPii-jTb)  (4.1) 

j=-oo 

where  Tiy  is  the  symbol  duration,  p{t)  is  the  transmitted  symbol  waveform  defined  over  [0,Tp]  and  dj  G  {0, 1}  is 
j-th  transmitted  bit.  Without  loss  of  generality,  assume  the  minimal  propagation  delay  is  equal  to  zero.  The  energy 
of  p{t)  is  Ep, 

[  (t)  dt  =  Ep  (4.2) 

Jo 

The  received  noise-polluted  signal  at  the  output  of  the  receiver  front-end  filter  is 

r{t)  =  h{t)  (8)  s(t)  -h  n(t) 

oo 

=  ^  djxit  -  jTb) +  n(t),  (4.3) 

j=-oo 
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Figure  4. 1 :  System  architecture. 

where  h{t)  €  [OyT^]  is  the  multipath  impulse  response  that  takes  into  account  the  effect  of  channel  impulse 
response,  the  RF  front-ends  in  the  transceivers  including  antennas.  h{t)  is  available  at  the  transmitter  [32]  [33]. 
denotes  convolution  operation.  n{t)  is  a  low-pass  additive  zero-mean  Gaussian  noise  with  one-sided  bandwidth  W 
and  one-sided  power  spectral  density  A^o-  ^{t)  is  the  received  noiseless  symbol-**!”  waveform  defined  as 

x{t)  =  h{t)  (S>p{t)  (4.4) 

dcf 

We  further  assume  that  Tf)  >  +  Tp  =  Tx,  i.e.  no  existence  of  ISI. 


An  energy  detector  performs  square  operation  to  r{t)  without  any  explicit  analog  filter  at  the  receiver.  Then  the 
integrator  does  the  integration  over  a  given  integration  window  T/.  Corresponding  to  the  time  index  /c,  the  A:-th 
decision  statistic  at  the  output  of  the  integrator  is  given  by 


rkTb+Tjo+Ti 

/  r^{t)dt 

(4.5) 

/  kTb-\-TjQ 

rfcTb+T/o+7/ 

/  {dk3:{t  —  kTb)  -h  n{t))'^dt 

(4.6) 

/  kTb+Tio 

where  T/o  is  the  starting  time  of  integration  for  each  symbol  and  0  <  T/o  <  T/o  +  T)  <  <  T5. 


An  approximately  equivalent  SNR  for  the  energy  detector  receiver,  which  provides  the  same  detection  performance 
when  applied  to  a  coherent  receiver,  is  given  as  [34] 


SNR^  = 


2.ZT1WNI  +  No  (0  dt 


(4.7) 


(4.8) 


For  best  performance,  the  equivalent  SNR  SNR^q  should  be  maximized.  Define, 

fTio+Tj 

Ej=  (t)  dt 

Jtio 

For  given  T/,  Nq  and  W,  SNR^q  is  the  increasing  function  of  £*/.  So  the  maximization  of  SNReq  in  Eq.  (4.7)  is 
equvalent  to  the  maximization  of  Ej  in  Eq.  (4.8). 

So  the  optimization  problem  to  get  the  optimal  p  is  shown  below, 

max  x"^  {t)dt 


dt  =  Ep 


(4.9) 
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In  order  to  solve  the  optimization  problem  (4.9),  numerical  approach  is  employed.  In  other  words,  p(t),  h{t)  and 
x{t)  are  uniformly  sampled  (assumed  at  Nyquist  rate),  and  the  optimization  problem  (4.9)  will  be  converted  to  its 
corresponding  discrete-time  form.  Assume  the  sampling  period  is  T^.  Tp/Ts  =  Np,  T^/Ts  =  N^i  and  TxfTs  =  Nx- 
So  Nx  =  Np  Nfi . 


p(f ),  h{t)  and  x{t)  are  represented  by  Pi,  i 
tively  [34]. 

o 

11 

o' 

II 

. ,  Nh  and  Xi^i  =  0, 1, . . . ,  respec- 

Define, 

and 

P  =  bo  Pi  ■  ■  •  PNpf 

(4.10) 

X  =  [xo  X1  •  •  • 

(4.11) 

Construct  channel  matrix  H 


(yVx+i)x(Arp+i) 
( 


“  I  0, 


-j,0  <i-  j  <Nh 


else 


(4.12) 


where  {•)■  ■  denotes  the  entry  in  the  i-th  row  and  j-th  column  of  the  matrix  or  vector.  Meanwhile,  for  vector,  taking 
p  as  an  example,  (p)^  ^  is  equivalent  to  pi-\. 


The  matrix  expression  of  Eq.  (4.4)  is, 


(4.13) 


x  =  Hp 

and  the  constraint  in  the  optimization  problem  (4.9)  can  be  expressed  as, 

WAlTs^Ep  (4.14) 

where  “||•||2”  denotes  the  norm-2  of  the  vector.  In  order  to  make  the  whole  document  consistent,  we  further  assume, 

I|P||2  =  1  (4.15) 

Let  TijTa  =  Nj  and  Tjo/T^  =  Njq.  The  entries  in  x  within  integration  window  constitute  x/  as, 

[^^10  ^Njq  +  1  *  *  *  ^AT/o+AT/] 

and  £*/  in  Eq.  (4.8)  can  be  equivalently  shown  as, 


(4.16) 


^/  =  ||x/||^r. 


(4.17) 


Simply  dropping  Tg  in  Ej  will  not  affect  the  optimization  objective,  so  Ej  is  redefined  as, 

E,  =  \\x,\\l  (4.18) 

Similar  to  Eq.  (4.13),  x/  can  be  obtained  by, 

X/  =  H/P 

where  (H/)^  =  (H) and  z  =  1, 2, . . . ,  iV/  +  1  as  well  as  j  =  1, 2, . . . ,  A^p  +  1. 


(4.19) 
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The  optimization  problem  (4.9)  can  be  represented  by  its  discrete-time  form  as, 


max£^/ 

S.t.  ||p||2  =  1 


(4.20) 


The  optimal  solution  p*  for  the  optimization  problem  (4.20)  is  the  dominant  eigen-vector  in  the  following  eigen¬ 
function  [34], 

HfH/p  =  Ap  (4.21) 

Furthermore,  will  be  obtained  by  Eq.  (4. 1 8)  and  Eq.  (4. 19). 


Trade-Off  between  Energies  Within  and  Outside  Integration  Window 

The  energy  outside  of  the  integration  window  needs  to  be  concerned  sometimes,  say,  when  ISI  has  to  be  considered. 
In  order  to  reduce  ISI,  the  energies  within  and  outside  of  integration  window  should  be  balanced,  which  means  the 
energy  within  integration  window  should  be  maximized  and  the  energy  outside  of  integration  window  should  be 
minimized. 

The  entries  in  x  outside  of  integration  window  constitute  x/  as, 

[^0  *  ’  ’  ^Njq-\-Nj-\-i  ’  ’  ’  ^Nx\  (4.22) 

and  the  energy  outside  of  integration  window  Ej  can  be  expressed  as, 

Ei  =  ||x/||2  (4.23) 

Similar  to  Eq.  (4. 19),  x/  can  be  obtained  by, 

X/  =  H;P 

where  (H/)^  ^  =  (H)-^^  when  i  =  l,...,Njo  and  =  (H)-j  when  i  =  Njq  +Nj  +  2,. 

as  well  as  j  =  1,2, . . . ,  A/p  +  1. 

In  order  to  balance  energies  within  and  outside  of  integration  window,  the  trade-off  factor  a  is  introduced 
of  a  is  from  0  to  1.  Given  a,  the  optimization  problems  is  formulated  as  follows, 

maxaEj  —  (1  —  <^)Ej 
s.t.  \\p\\l  =  1 

The  optimal  solution  p*  for  the  optimization  problem  (4.25)  is  the  dominant  eigen- vector  in  the  following  eigen¬ 
function, 

[aHfH/  -  (1  -  a)H}’H/]  p  =  Ap  (4.26) 


(4.24) 
..,N^  +  1 

.  The  range 

(4.25) 


Binary  Waveform 

If  the  transmitted  waveform  is  constrained  to  the  binary  waveform  because  of  the  hardware  limitation  or  implemen¬ 


tation  simplicity,  which  means  pi^i  =  0, 1, . . . ,  Np  is  equal  to 


■  v/I+^  ^/\+Nf 


,  then  the  optimization  problem 
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is  expressed  as, 


maxE/ 

s-^-[(P)i,i]^  =  =  0,1,....  Np 


(4.27) 


One  suboptimal  solution  to  the  optimization  problem  (4. 110)  is  derived  from  the  optimal  solution  p*  of  the 
optimization  problem  (4.20).  When  p*  is  obtained,  then 


(P6l),l 


y^l  +  Np 


(P*)i,i  >  0 
>(P*)i,i  <  0 


(4.28) 


This  simple  method  can  lead  to  the  optimal  solution  to  the  optimization  problem  (4.1 10)  when  Tj  0,  which  can 
be  proofed  by  CauchySchwarz  inequality,  but  if  T/  is  greater  than  zero,  there  is  still  a  improvement  potential  to  this 
suboptimal  solution  obtained  from  Eq.  (4.28). 


It  is  well  known  that  the  optimization  problem  (4.27)  is  Quadratically  Constrained  Quadratic  Program  (QCQP)  and 
general  QCQP  is  NP-hard,  so  a  semidefinite  relaxation  method  is  proposed  to  give  the  suboptimal  solution  to  this 
optimization  problem. 


Define, 


P  =  PP^ 


(4.29) 


P  should  be  a  symmetric  positive  semidefinite  matrix,  i.e.  P  ^  =  0  and  rank  of  P  should  be  equal  to  1.  Reformulate 
Ej  as, 


P^njHjp 

(4.30) 

trace  (H^H/pp^) 

(4.31) 

trace  (H|'H/P) 

(4.32) 

Rank  constraint  is  nonconvex  constraint,  so  after  dropping  it,  QCQP  is  relaxed  to  the  Semidefinite  Program  (SDP), 


max  trace 

(P)m  =  “ 

Py=0 


0,1,.. 


(4.33) 


The  optimal  solution  P*  of  the  optimization  problem  (4.33)  can  be  obtained  by  using  CVX  tool  [35]  and  the  value 
of  the  objective  function  in  the  optimizatiion  problem  (4.33)  gives  the  upper  bound  of  the  optimal  value  in  the 
optimization  problem  (4.27).  Project  the  dominant  eigen- vector  of  P*  on  —  y/u-N 

(4.28),  the  suboptimal  solution  p^2  achieved  [36]. 


Finally,  the  designed  binary  waveform  is. 


pI  =  arg  max  p^H|^H/p 


(4.34) 
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Ternary  Waveform 

If  the  transmitted  waveform  is  constrained  to  the  ternary  waveform,  which  means  pi^i  =  0, 1, . . . ,  iVp  is  equal  to 
three  levels,  i.e.  — c,  0  or  c,  then  the  optimization  problem  is  expressed  as, 

max  Ef 

si.[(p)t,i]^  =  c^or0,i  =  0, 1, . . . ,  A'p  (4.35) 

IIPII2  =  1 

where  the  value  of  c  will  be  determined  later. 


The  optimization  problem  (4.35)  is  still  NP-hard  and  can  be  approximately  reformulated  as, 

max  Ej 

Cardinality  (p)  <  k  (4.36) 

I  <  k  <  Np  +  1  ||p|l2  “  1 

where  Cardinality  (p)  denotes  the  number  of  non-zero  entries  of  p  and  cardinality  constraint  is  also  a  nonconvex 
constraint. 

Because  k  is  the  integer  number  between  1  and  +  1,  the  optimization  problem  (4.36)  can  be  decomposed  into 
A^p  -h  1  independent  and  parallel  sub-problems  and  each  sub-problem  is  shown  as, 

max  Ej 

s.t. Cardinality  (p)  <  k  (4.37) 

I|P||2  =  1 

where  k  is  equal  to  1,  2,  •  •  • ,  or  A^p  +  1; 


The  sub-problems  (4.37)  can  be  solved  in  parallel  and  then  the  solutions  are  combined  to  get  the  solution  of  the 
original  optimization  problem  (4.35).  Reuse  the  definition  in  Eq.  (4.29)  and  the  sub-problem  (4.37)  can  converted 
to  the  following  SDP  by  semidefinite  relaxation  combined  with  11  heuristic  [36], 


max  trace  (HJ H/P) 
5. t. trace  (P)  =  1 
s?'  |P|a  <  fc 
P  0 

where  a  is  all-1  column  vector  and. 


(4.38) 


T 

P  P 

(4.39) 

trace  (pp^) 

(4.40) 

trace  (P) 

(4.41) 

The  CVX  tool  [35]  is  also  operated  to  get  the  optimal  solution  Pj^  of  SDP  (4.38).  From  the  dominant  eigen-vector 
pI  of  Pj^  and  the  threshold  pthk>  the  solution  for  the  sub-problem  (4.37)  can  be  achieved  as, 

{Pk)i,l  ^  Pthk 
0>|(Pfc)i,i|  ^Pthk 
,  — Cfc,  (p^).  j  <  Pthk 


(Ptk)i,l  =  < 


(4.42) 
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where 


and 


Pthk  =  argmax(p*k)^Hj’H/Pj\ 
(Pth) 

s.i. Cardinality  (p*]^)  <  k 


— 


yj Cardinality  (p*j^) 


Finally,  the  designed  ternary  waveform  is, 

Pt  =  arg 


max 

pe{Pt\,A:=l,2,...,A^p  +  l} 


p^HfHzp 


(4.43) 


(4.44) 


(4.45) 


Sparse  Waveform 


Due  to  the  sparse  characteristics  of  channel  impulse  response  or  the  implementation  limitation,  sometimes,  the  trans¬ 
mitted  waveform  should  be  sparse,  which  means  within  the  waveform  duration  some  of  the  waveform  coefficients 
are  zeros.  In  this  way,  the  consumption  of  hardware  resources  will  be  greatly  reduced.  The  optimization  goal  is 
to  design  sparse  waveform  with  least  number  of  non-zero  waveform  coefficients  and  with  higher  percentage  of  the 
received  energy  compared  with  the  optimal  waveform. 

Generally,  it  is  hard  to  get  the  optimal  sparse  waveform  directly.  However,  exploiting  iterative  algorithm  coming 
from  the  optimization  problem  (4.37)  and  its  corresponding  relaxation  issue  (4.38)  can  gives  us  the  reasonable 
solutions. 

The  iterative  algorithm  is  based  on  the  loop  of  k  from  1  to  A^p  +  1.  For  each  k,  the  optimal  waveform  to  the 
optimization  problem  (4.38)  can  be  obtained.  And  then,  the  abstract  values  of  the  waveform  coefficients  are  sorted 
from  smallest  one  to  the  biggest  one.  We  can  set  some  of  the  smaller  values  to  be  zeros  such  that  the  energy  of 
the  remaining  values  is  above  some  threshold,  for  example,  95  percent  of  total  waveform  energy.  Accordingly,  the 
sparse  waveform  for  the  certain  k  is  achieved  and  the  true  sparsity  of  the  waveform  k  can  also  be  gotten.  Sometimes 
k  is  not  equal  to  k.  Finally,  after  the  iterative  algorithm  is  done,  the  relationship  between  the  optimal  energy  and 
the  sparsity  is  determined.  Based  on  this  relationship,  we  can  choose  the  reasonable  sparse  waveform  such  that  the 
energy  loss  in  the  receiver  is  tolerable. 

Another  phenomenon  found  by  the  simulation  is  that  some  of  the  waveform  coefficients  in  the  optimal  waveform 
obtained  from  the  optimization  problem  (4.38)  is  very  small  compared  with  other  waveform  coefficients.  In  this 
way,  these  small  waveform  coefficients  can  directly  be  set  to  zeros,  which  does  not  impact  the  system  performance 
obviously. 


Peak-to-Average  Power  Ratio 

PAPR  is  one  of  major  concerns  in  waveform  design.  Because  of  nonlinearity  caused  by  nonlinear  devices  such 
as  Digital-to-Analog  Converter  (DAC)  and  Power  Amplifier  (PA),  maximal  transmitted  power  has  to  be  backed 
up,  resulting  in  inefficient  utilization.  PAPR  in  OFDM  has  been  well  studied.  PAPR  is  handled  under  a  unified 
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optimization  framework.  It  is  defined  as, 


PAPR=- 


llpIlL 

IIpII'Aa'p  +  i) 


where 

l|p|loo  =  "iax(|po|,|pi|,--  -  ,|pWp|) 


(4.46) 


(4.47) 


If  ||p||2  is  given,  reducing  PAPR  is  equivalent  to  setting  the  upper  bound  for  ||p||oo-  So  the  optimization  problem 
can  be  expressed  as, 


max  El 

s.t.  \\p\\l  =  1  (4.48) 

IIpIIoo  ^  ub 


The  bound  constraint  ||p||^  <  ub  can  also  be  written  as, 

— ub  <  Pi  <  ub,  i  =  0, 1, . . . ,  Np 


which  can  be  further  simplified  as, 

P?  <  (ub)^,i  =  0,l,...,Ap 


(4.49) 

(4.50) 


Reuse  the  definition  in  Eq.  (4.29),  the  optimization  problem  (4.46)  can  be  relaxed  to  SDP, 

max  trace  (Hj^H/P) 

s.t.  <  (ub)2  ,  i  =  0, 1, . . . ,  A^p 

trace  (P)  =  1  ^  ’ 

P  ^=0 

By  CVX  tool  [35],  the  optimal  solution  P*  of  the  optimization  problem  (4.5 1)  is  obtained.  If  the  rank  of  P*  is  equal 
to  1,  then  the  dominant  eigen-vector  of  P*  will  be  the  optimal  solution  p*  for  the  optimization  problem  (4.48).  But 
if  the  rank  of  P*  is  not  equal  to  1,  the  dominant  eigen- vector  of  P*  can  not  be  treated  as  the  optimal  solution  for  the 
optimization  problem  (4.48),  because  of  the  violation  of  bound  constraint. 

So  a  computationally-efficient  iterative  algorithm  is  proposed  to  get  the  suboptimal  solution  p*  to  the  optimization 
problem  (4.48)  as  follows. 

1.  Initialization:  P  —  Hq  =  H^H/  and  p*  is  set  to  be  all-0  column  vector. 


2.  Solve  the  following  optimization  problem  to  get  the  optimal  q. 


max  q^Hoq 
s.t.  \\ci\\l  =  P 


(4.52) 


3.  Find  2,  such  that  \qi\  is  the  maximal  value  in  the  set  {\qj\  \  \qj\  >  ub}.  If  {i}  =  0,  then  the  algorithm  is  terminated 
and  p*  :=  p*  +  q.  Otherwise  go  to  step  4. 

4.  If  qi  is  greater  than  zero,  then  (p*)j  j  is  set  to  be  ub.  Otherwise  (p*)^  j  is  set  to  be  —  ub. 

5.  P  :=  P  —  (ub)^  and  set  (Hq)^  ^  ,  j  =  1, 2, . . . ,  -h  1  and  (Ho)j  ^ ,  j  =  1, 2, . . . ,  -h  1  all  to  zeros.  Go  to  step 
2. 
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Figure  4.2:  System  architecture. 

4.1.2  Wideband  Waveform  Optimization  for  Multiple  Input  Single  Output  Cognitive  Radio  with 
Practical  Considerations 

Wideband  Waveform  Optimization  Using  Cauchy— Schwarz  Inequality-based  Iterative  Method 


The  system  architecture  considered  in  this  document  is  shown  in  Figure  4.2.  We  limit  our  discussion  to  a  single 
pair  of  cognitive  radios  scenario.  There  are  N  antennas  at  the  transmitter  and  one  antenna  at  the  receiver.  OOK 
modulation  is  used  for  transmission.  Thus  the  transmitted  signal  at  the  transmitter  antenna  n  is, 


oo 

Sn{t)=  djPn{t-jTb)  (4.53) 

j=—oo 

where  T5  is  the  bit  duration,  pn{t)  is  the  transmitted  bit  waveform  defined  over  [0,  Tp]  at  the  transmitter  antenna  n 
and  dj  ^  {0, 1}  is  j-th  transmitted  bit.  Without  loss  of  generality,  the  minimal  propagation  delay  is  assumed  to  be 
zero.  The  energy  of  transmitted  waveforms  is  Ep, 


TL I  pI  (^) 


(4.54) 


The  received  noise-polluted  signal  at  the  output  of  low  noise  amplifier  (LNA)  is, 

N 

r{t)  =  hn  {t)  ®  Sn  {t)  +  n  {t) 

n=l 

00  N 

=  c?jX^3:„(i-jT6)  +  n(i)  (4.55) 

jf=— OO  n=l 

where  hn{t)  €  [0,  T^]  is  the  multipath  impulse  response  that  takes  into  account  the  effect  of  channel  impulse 
response,  the  RF  front-ends  in  the  transceivers  such  as  power  amplifier,  LNA  and  arbitrary  notch  filter  as  well  as 
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antennas  between  the  transmitter  antenna  n  and  the  receiver  antenna.  is  available  at  the  transmitter  [32]  [33]. 
/o^'‘  hi  (t)  dt  =  Enh-  denotes  convolution  operation.  n{t)  is  AWGN.  Xnit)  is  the  received  noiseless  bit-“r’ 
waveform  defined  as 

^n{t)  =  hn{t)  <8)  Pn{t)  (4.56) 


def  - 


We  further  assume  that  T})  >  Tp  =  T^,  i.e.  no  existence  of  ISI. 

If  the  waveforms  at  different  transmitter  antennas  are  assumed  to  be  synchronized,  the  A:-th  decision  statistic  is, 


N 


r{kTi)  +  to)  —  {kTb  +  to-  jTh)  +  n  (t) 

j=—oo  n=l 
N 

=  dk^^nito)  +n{t) 


(4.57) 


n=l 


N 


In  order  to  maximize  the  system  performance,  ^  Xn  (to)  should  be  maximized.  Thus  the  optimization  problem  can 

n=l 

be  formulated  as  follows  to  get  the  optimal  waveforms  Pn{t), 


N 

maximize  ^  Xn  (to) 

n=l 

subject  to 

E  lo”  pI  W  dt  <  Ep 

n=l 

0  <  fo  <  n 


(4.58) 


An  iterative  method  is  proposed  here  to  give  the  optimal  solution  to  the  optimization  problem  (4.58).  This  method  is 
a  computationally  efficient  algorithm.  For  simplicity  in  the  following  presentation,  to  is  assumed  to  be  zero,  which 
will  not  degrade  the  optimum  of  the  solution  if  such  solution  exists. 


N 


X  (<)  =  E  (t) 


n=l 


From  inverse  Fourier  transform, 


and 


XnfU)  =  hnfif)Pnf  if) 


N 


^fif)  =  ^hnfif)pn!  if) 


(4.59) 


(4.60) 


(4.61) 


n=l 


where  x„/  (/),  hnf  (/)  and  Pn/  (/)  are  the  frequency  domain  representations  of  Xn{t),  hn{t)  and  Pn{t)  respectively. 
Xf{f)  is  frequency  domain  representation  of  x{t).  Thus, 


N 


^  (^)  =  E 


n=l 


(4.62) 
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and, 


/OO 

^nf  if)  df 

-OO 


(4.63) 


If  there  is  no  spectral  mask  constraint,  then  according  to  the  Cauchy— Schwarz  inequality, 

N 


^  (^)  ~  y  ^  f  ^nf  (/)  Pnf 

_ 1  J—oo 


{f)df 


n=l 

N 


/  rOO  rOO 

<  E\//  Kf{f)\^df  \Pnfif)fdf 

y  */— OO  OO 


< 


^  TOO  ^  TOO 

E/  \f^nfif)fdf  \Pnfif)fdf 


\  ^  V  ^nh 

\  n=l 


when  Pnf  (/)  =  ahjif  (/)  for  all  /  and  n,  two  equalities  are  obtained. 


N 


(4.64) 


\  n=\ 


(4.65) 


In  this  case,  Pn{t)  =  a/i„(— ^),  which  means  the  optimal  waveform  Pn{t)  is  the  corresponding  time  reversed 
multipath  impulse  response  hn{t). 

If  there  is  spectral  mask  constraint,  then  the  following  optimization  problem  will  become  more  complicated. 


maximize  x  (0) 
subject  to 

E  Jl’’plit)dt<Ep 


(4.66) 


n=l 


\Pnfif)f<Cnfif) 

where  c„/(/)  represents  the  arbitrary  spectral  mask  constraint  at  the  transmitter  antenna  n. 
Because  Pnf{f)  is  the  complex  value,  the  phase  and  the  modulus  of Pnfif)  should  be  determined. 
Meanwhile, 

/OO 

Xf  if)  df 

-OO 

and, 

N 

if)  =  E  \Pnf  (/)! 


(4.67) 


(4.68) 


71=1 


where  the  angular  component  of  the  complex  value  is  arg  (•). 
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For  the  real  value  signal  x{t), 

where  denotes  conjugate  operation.  Thus, 

N 


(-/)  =  E  1^"/  (/)l  \Pnf  (/)l  e-MMf^r,fU))  +  MPr>fU))) 


(4.69) 


(4.70) 


71=1 


and  X f{f)  +  Xf{—f)  is  equal  to 


N 


E  1^"/  cos(27r  (arg  (/i„/  (/))  +  arg  (p„/  (/)))) 


(4.71) 


71=1 


If  hnfif)  and  \pnf  (/)|  are  given  for  all  /  and  n,  maximization  of  2:(0)  is  equivalent  to, 

arg(/i„/  (/))  +  arg(p„/(/))  =  0 

which  means  the  angular  component  of  Pnjif)  is  the  negative  angular  component  of  /in/(/)- 
The  optimization  problem  (4.66)  can  be  simplified  as, 


(4.72) 


N 


maximize 


E  I-oo\^nf{f)\\Pnf{f)\df 


71=1 


subject  to 
Zjr:^\Pnf{f)\^df<E, 
\Pnf  (/)|^  <  Cnfif) 


(4.73) 


Because, 

\hnf{f)\  =  \hnf{-f)\  (4.74) 

\Pnf{f)\  =  \Pnf{-f)\  (4.75) 

|c„/(/)|  =  |c„/(-/)|  (4.76) 

for  all  /  and  n.  Thus  uniformly  discrete  frequency  points  /o,  . . .,  /m  arc  considered  in  the  optimization  problem 
(4.73).  Meanwhile,  /o  corresponds  to  the  DC  component  and  /i,  . . .,  /a/  correspond  to  the  positive  frequency 
components. 


hf  =  [hijhij ... 

\hnf  =  1 


Define  column  vectors  h/,  hi/, . . .,  hyv/, 


(hn/)i  I  y/2\hnj{fi-l)\,i  =  2,...,M  +  l 
where  “T”  denotes  transpose  operation. 

Define  column  vectors  p/,  Pi/, . . .,  p^^/, 

p/ =(?[/?!/  •••  p?;/F 


(4.77) 

(4.78) 


(4.79) 


4. 1 .  WAVEFORM  OPTIMIZATION 


45 


N/2|Pn/  (/j-i)|,t  =  2,...,A/  +  1 


Define  column  vectors  cy,  ciy, . . CA^y, 


Cf  =  [clfclf  ...  clff 


(C; 


y/Wnf  (/z'-l )|?  i  —  1 


{  V2|c„/  (/i-i)|,i  =  2,...,A/+l 


Thus,  the  discrete  version  of  the  optimization  problem  (4.73)  is  shown  below, 

maximize  h/P/ 
subject  to 

IIP/II2  ^ 

0  <  pf  <  Cf 


(4.80) 


(4.81) 

(4.82) 


(4.83) 


An  iterative  algorithm  (Algorithm  I)  is  shown  as  follows  to  give  the  optimal  solution  to  the  optimization  problem 
(4.83),  which  was  proposed  in  [37]  and  is  extended  to  waveform  design  in  the  context  of  MISO  cognitive  radio: 

1.  Initialization:  P  =  Ep  and  is  set  to  be  all-0  column  vector. 

2.  Solve  the  following  optimization  problem  to  get  the  optimal  using  Cauchy— Schwarz  inequality. 

maximize  hjq/ 

subject  to  (4.84) 

I|q/ll2  <  P 

3.  Find  i,  such  that  is  the  maximal  value  in  the  set  |  |  (^/)  ^  method 

is  terminated  and  p^  :=  p]r  +  Q/-  Otherwise  go  to  step  4. 

4-  Set  (p*)  _  =  (c/).. 

5.  P  :=  P  —  (cf)^  and  set  (h/)^  to  zero.  If  ||hy||2  is  equal  to  zero,  then  the  algorithm  is  terminated;  otherwise  go 
to  step  2. 

When  p^  is  obtained  for  the  optimization  problem  (4.83),  from  Eq.  (4.72),  Eq.  (4.79)  and  Eq.  (4.80),  the  optimal 
Pnfif)  and  the  corresponding  Pn{t)  can  be  smoothly  achieved. 

Wideband  Waveform  Optimization  Using  SDP-Based  Iterative  method 

The  Pn{t)  and  the  hn{t)  are  uniformly  sampled  at  Nyquist  rate.  Assume  the  sampling  period  is  Tg.  TpjTs  =  Np 
and  Np  is  assumed  to  be  even,  T^/Ts  =  N^.  Pn{t)  and  hn{t)  are  represented  by  pni,  i  =  0, 1, . . . ,  iVp  and  /i„j,  i  = 
0, 1, . . . ,  iV^  respectively. 
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Define, 


Pn  —  [PnO  Pnl  *  ’  ’  PnNp] 

(4.85) 

and 

hn  =  [^nNfi  ^n{Nh—l)  *  *  ’  ^no] 

(4.86) 

N 

If  Np  =  Nh,  then 

n=l 

N 

{to)  can  be  equivalent  to  ^  h^Pn-  Define, 

n=l 

p=[prpr 

(4.87) 

and 

h  =  [h[h2^  ...  hlf 

(4.88) 

Thus, 

=  h^p 

n=l 

(4.89) 

Maximization  of  h^p  is  the  same  as  maximization  of  (h^p)^  as  long  as  h^p 

is  equal  to  or  greater  than  zero. 

(h^p)^  =  (h^p)  (h^p) 
=  p^hh^p 
=  trace  (hh^pp^) 
=  trace  (HP) 


(4.90) 


where  H  =  hh^  and  P  =  pp^.  P  should  be  rank-1  positive  semidefinite  matrix.  However,  rank  constraint  is  non- 
convex  constraint,  which  will  be  omitted  in  the  following  optimization  problems.  Thus  the  optimization  objective  in 
the  optimization  problem  (4.66)  can  be  reformulated  as, 


maximize  trace  (HP) 


(4.91) 


Meanwhile, 


I|P|I2  =  P^P 


=  trace  (pp^) 

=  trace  (P) 

Thus  the  energy  constraint  in  the  optimization  problem  (4.66)  can  be  reformulated  as, 

trace  (P)  <  Ep 


(4.92) 

(4.93) 


For  cognitive  radio,  there  is  a  spectral  mask  constraint  for  the  transmitted  waveform.  Based  on  the  previous  discus¬ 
sion,  p„  is  assumed  to  be  the  transmitted  waveform,  and  F  is  the  discrete-time  Fourier  transform  operator,  thus  the 
frequency  domain  representation  of  Pn  is, 

P/n  =  Fp„  (4.94) 

where  pjn  is  a  complex  value  vector.  If  the  i-th  row  of  F  is  f^,  then  each  complex  value  in  p/^  can  be  represented 
by, 

(p/n)u  =  frPn,  I  =  1 , 2,  .  .  .  ,  +  1  (4.95) 
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Define, 


F,  =  =1,2,. 


(4.96) 


Given  the  spectral  mask  constraint  in  terms  of  power  spectral  density 


-\T 


Gil  Cn2 


C  , 


,  SO 


=  IfrPnl' 

=  p^f"f,p„ 

=  P^F,p„ 

—  ^  2, 


where  !•!  is  the  modulus  of  the  complex  value. 
Define  selection  matrix  G 


So, 

and 


^  \  0,  else 


Pn  - 


2 


PnFiPn 
p^S^F,S„p 
trace  (S^FiS„pp^) 
trace  (S^F,S„P) 


(4.97) 


(4.98) 

(4.99) 


(4.100) 


The  optimization  problem  (4.66)  can  be  reformulated  as  SDP  based  on  (4.91),  (4.93),  (4.97)  and  (4.100), 

maximize  trace  (HP) 
subject  to 
trace  (P)  <  Ep 
trace  (S^F,S„P)  <  c„j 

7  —  12  +  ^ 

n  =  1, 2, . . . 


(4.101) 


If  the  optimal  solution  P*  to  the  optimization  problem  (4.101)  is  the  rank-1  matrix,  then  the  optimal  waveforms  can 
be  obtained  from  the  dominant  eigen-vector  of  P*.  Otherwise,  Ep  in  the  optimization  problem  (4.101)  should  be 
decreased  to  get  the  rank-1  optimal  solution  P*  to  satisfy  all  the  other  constraints. 

An  SDP-based  iterative  algorithm  (Algorithm  II)  is  proposed  to  get  the  rank-1  optimal  solution  P*: 

1 .  Initialization  of  Ep. 

2.  Solve  the  optimization  problem  (4.101)  and  get  the  optimal  solution  P*. 
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3.  If  the  ratio  of  dominant  eigen-value  of  P*  to  trace  (P*)  is  less  than  0.99,  then  set  Ep  to  be  trace  (P*)  and  go  to 
step  2;  otherwise,  the  algorithm  is  terminated. 

The  optimal  waveforms  can  be  obtained  from  the  dominant  eigen-vector  of  P*  and  Eq.  (4.87). 


Peak-to-Average  Power  Ratio 

PAPR  is  one  of  major  concerns  in  waveform  design.  Because  of  nonlinearity  caused  by  nonlinear  devices  such 
as  Digital-to-Analog  Converter  (DAC)  and  Power  Amplifier  (PA),  maximal  transmitted  power  has  to  be  backed 
up,  resulting  in  inefficient  utilization.  PAPR  in  OFDM  has  been  well  studied.  PAPR  is  handled  under  a  unified 
optimization  framework.  It  is  defined  as. 


PAPR  = - (4.102) 

l|Pn|l2  j “b  1) 

where 

llPnlloo  =  niax(|p„oMPni|,--  -  .|PnWp|)  (4.103) 

If  the  denominator  of  Eq.  (4.102)  is  omitted,  reducing  PAPR  is  equivalent  to  setting  the  upper  bound  for  ||pn|loo- 
The  bound  constraint  ||Pnlloo  —  can  also  be  written  as, 

— <  Pyii  <  bj-i,  2  =  0, 1, ... ,  Np  (4.104) 

which  can  be  further  simplified  as, 

pli<{hnf  ,i  =  0,1,...,  Np  (4.105) 


Define  selection  vector  s„j  G  ^ 


So, 

and 


(SnOl  j 


f  1,  j  =  i-\-{Np4-l){n-l) 
\  0,  else 


Pm  —  SyitP 


—  (Smp) 

T 

—  (Snip)  (Snip) 

T  T 

=  P  S^iSmP 
=  trace  (s^^Snipp"^) 
=  trace  (s^,s„iP) 


(4.106) 


(4.107) 


(4.108) 


4.L  WAVEFORM  OPTIMIZATION 


49 


The  optimization  problem  (4.66)  or  the  optimization  problem  (4.101)  together  with  PAPR  consideration  can  be 
presented  as  SDP, 


maximize  trace  (HP) 
subject  to 
trace  (P)  <  Ep 

trace  (SjFjS„P)  <  c„i  (4.109) 

trace  (s^jS„jP)  <  b„ 


n  =  1,2,. ..N 


Similarly,  if  the  optimal  solution  P*  to  the  optimization  problem  (4.109)  is  the  rank-1  matrix,  then  the  optimal 
waveforms  can  be  obtained  from  the  dominant  eigen-vector  of  P*.  Otherwise,  Ep  in  the  optimization  problem 
(4.109)  should  be  decreased  to  get  the  rank-1  optimal  solution  P*  to  satisfy  all  the  other  constraints. 

An  SDP-based  iterative  algorithm  (Algorithm  III)  is  proposed  to  get  the  rank-1  optimal  solution  P*: 

1 .  Initialization  of  Ep. 

2.  Solve  the  optimization  problem  (4.109)  and  get  the  optimal  solution  P*. 

3.  If  the  ratio  of  dominant  eigen-value  of  P*  to  trace  (P*)  is  less  than  0.99,  then  set  Ep  to  be  trace  (P*)  and  go  to 
step  2;  otherwise,  the  algorithm  is  terminated. 

The  optimal  waveforms  can  be  obtained  from  the  dominant  eigen- vector  of  P*  and  Eq.  (4.87). 


Binary  Waveform 


If  the  transmitted  waveform  is  constrained  to  the  binary  waveform  because  of  the  hardware  limitation  or  imple¬ 
mentation  simplicity,  or  equivalently  if  optimization  problem  (4.66)  or  the  optimization 

problem  (4.101)  together  with  binary  waveform  design  can  be  formulated  as  SDP, 

maximize  trace  (HP) 
subject  to 
trace  (P)  ==  Ep 

trace  (SjFiS„P)  <  c„i  (4.1  iq) 

trace  (s^,s„iP)  == 

i  =  I  2 
n=:  1,2,. ..iV 

However,  the  constraints  trace  (P)  ==  Ep  and  trace  (s^-s^zP)  ==  (tv  ^i)n  bring  non-rank- 1  optimal 

solution  P*  or  invalid  solution,  i.e.  no  feasible  region  for  the  optimization  problem  because  of  the  constraints,  to 
the  optimization  problem  (4.1 10).  Thus  the  equality  constraints  are  relaxed  to  the  inequality  constraints  and  the 
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optimization  problem  (4.1 10)  is  relaxed  to, 

maximize  trace  (HP) 
subject  to 
trace  (P)  <  Ep 

trace  (S^F,S„P)  <  c„i  (4.111) 

trace  (s^.s„,P)  < 

n  =  1,2,. ..N 


However,  such  relaxation  forces  us  to  verify  the  feasibility  of  the  optimal  solution  P*  to  the  optimization  problem 

(4.1 1 1) .  If  the  dominant  eigen-value  of  P*  is  the  same  as  Ep,  which  means  trace  (s^^s^^P)  is  equal  to 

for  all  i  and  n,  then  the  optimal  solution  P*  is  feasible  and  the  optimal  binary  waveforms  can  be  obtained  from 
the  dominant  eigen-vector  of  P*  and  the  dominant  eigen-value  of  P*.  Otherwise,  Ep  in  the  optimization  problem 

(4. 1 1 1)  should  be  decreased. 

An  SDP-based  iterative  algorithm  (Algorithm  IV)  is  proposed  to  get  the  rank- 1  optimal  solution  P*: 

1.  Initialization  of  Ep. 

2.  Solve  the  optimization  problem  (4.1 1 1)  and  get  the  optimal  solution  P*. 

3.  If  the  ratio  of  dominant  eigen-value  of  P*  to  Ep  is  less  than  0.9999,  then  set  Ep  to  be  trace  (P*)  and  go  to  step 
2;  otherwise,  the  algorithm  is  terminated. 

The  optimal  waveforms  can  be  obtained  from  the  dominant  eigen-vcctor  of  P*  and  Eq.  (4.87). 


Robust  Waveform 

For  the  practical  consideration  of  uncertainty,  multipath  impulse  responses  can  not  be  obtained  exactly  due  to  the 
limitation  of  sounding  system  and  feedback  system  or  the  perturbation  of  the  radio  environment  and  the  fading  of 
radio  channel.  The  norm  of  uncertainty  for  each  multipath  impulse  response  is  assumed  to  be  bounded  by  the  known 
value. 

h„  =  h«+Ah„  (4.112) 

and 

||Ahn||2  <^n  (4.II3) 

where  is  the  nominal  value  of  multipath  impulse  response,  Ah„  is  the  uncertainty  of  multipath  impulse  response 
and  Sn  is  the  norm  bounded  value  for  uncertainty. 

Based  on  Eq.  (4.88),  Eq.  (4.1 12)  and  Eq.  (4.1 13),  h  can  be  reformulated  as, 

h  =  h‘’  +  Ah  (4.114) 

h«=[(h?)^  •••  KfY 


where, 


(4.115) 
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and 


Ah  = 


{Ahif  •••  {Ah^f 


(4.116) 


The  norm  of  Ah  is 


N 


N 


||Ah||2  =  ^||Ah„l|2<^4  =  £2 


(4.117) 


n=l 


n=l 


From  Eq.  (4,89),  maximization  of  h^p  is  the  same  as  maximization  of  |h^p|  as  long  as  h^p  is  equal  to  or  greater 
than  zero.  Based  on  the  triangle  inequality  [38]  [39], 


\^^P\ 


(h«  +  Ah)^p 
(h‘^)'^p  + Ah^p 


> 


-  lAh^pl 


Meanwhile,  according  to  Cauchy —Schwarz  inequality, 

|Ah^p|  <  ||Ah||2  ||p||2  =  e||p|l2 


Hence, 

|h^p|  >  (h‘^)^p  -e||p||2 

and  if  the  uncertainty  is  small  compared  with  the  nominal  value  of  multipath  impulse  response,  i.e, 


e||p|l2  ^  0 


then 


Ih^pl^  > 


> 


-e||Pll2; 


-2e 


||P|I2  +  ^"  l|Pll2 


(h“)^p  -2e||h“||2||p||2+e^||p|| 


(4.118) 

(4.119) 

(4.120) 

(4.121) 

(4.122) 

(4.123) 

(4.124) 

(4.125) 

(4.126) 


Define  =  h®  (h*^)^  and  reuse  P  =  pp^,  |h^pp  can  be  further  relaxed  as, 


|h^p|^  >  trace  (H^P)  —  25  II h^||2  trace  (P) -h  £:^trace  (P) 

(4.127) 

=  trace  ^HP^ 

(4.128) 

where, 

H  =  HV(£2-2£||h‘^||2)l 

(4.129) 

Thus, 

min  Ih^pl  =  trace  (HP) 

||Ah||2<c  '  '  \  J 

(4.130) 
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For  robust  optimization,  the  worst-case  performance  should  be  guaranteed  and  the  worse-case  constraint  should  be 
beat.  The  optimizaton  problem  for  robust  waveform  design  can  be  formulated  as  SDP, 

minimize  trace  (P) 
subject  to 

trace  (hp)  >  5 

>  /  X  (4.131) 

trace  (S^FjS„P)  <  c„j 

i  =  l,2,...M 

n  =  1, 2  . . . , 

where  6  minimum  peak  energy  needed  for  the  receiver  to  meet  QoS.  If  there  is  a  rank-1  optimal  solution  P*  to  the 
optimization  problem  (4.131),  then  the  optimal  waveforms  can  be  obtained  from  the  dominant  eigen- vector  of  P* 
and  Eq.  (4.87);  otherwise,  there  is  no  solution  to  the  robust  waveform  design  problem. 


Multi-User  Waveform 


Multi-user  scenario  will  be  more  complex  than  previous  single  user  scenario.  In  multi-user  scenario,  there  is  one 
transmitter  with  multiple  antennas  and  there  is  more  than  one  receiver.  Each  receiver  with  single  antenna  represents 
one  user.  These  users  are  distributed  and  have  no  cooperation  at  the  receiver  side.  Meanwhile,  all  the  receivers 
are  assumed  to  be  synchronized  to  the  transmitter.  The  transmitter  will  serve  all  the  receivers  based  on  the  type  of 
services.  The  transmitter  is  assumed  to  know  all  the  multipath  impulse  responses  between  transmitter  and  receivers. 
Thus  the  joint  waveform  design  and  optimization  will  be  performed  at  the  transmitter  side. 

Assume  there  are  M  users  in  the  multi-user  scenario.  The  type  of  service  is  multi-cast,  which  means  the  same 
information  will  be  sent  to  all  the  users  simultaneously  and  equally.  If  bit  1  is  sent,  then  all  the  users  will  receive 
1.  If  bit  0  is  sent,  then  all  the  users  will  receive  0.  Extended  from  Eq.  (4,90),  will  contain  the  information  on 
multipath  impulse  response  for  user  m.  The  optimization  problem  for  the  multi-cast  service  can  be  formulated  as 
SDP, 

minimize  trace  (P) 
subject  to 
trace  (H^t^P )  ^  Sjji 

m  =  1,2,...,M  (4.132) 

trace  (SjFiS„P)  <  c„i 

n  =  1,2.. 

where  Sm  is  the  minimum  peak  energy  needed  for  user  m  to  meet  its  QoS. 

If  the  optimization  problem  (4.132)  can  be  solved  with  rank-1  optimal  solution  P*,  then  the  optimal  waveforms  with 
minimum  transmitted  power  can  be  obtained.  Otherwise,  there  is  no  solution  to  the  multi-cast  service  and  all  the 
users  can  not  be  served  simultaneously  and  satisfactorily.  Thus  user  selection  should  be  performed  to  select  part  of 
users  to  be  served.  The  goal  of  user  selection  is  to  maximize  the  number  of  users  whose  QoS  is  met.  If  the  number 
of  users  in  the  scenario  is  small  and  limited,  exhaustive  search  can  be  exploited  to  choose  the  users.  However  if 
the  number  of  users  is  large,  the  computation  of  exhaustive  search  is  prohibitive.  Generally,  the  greedy  kind  of 
algorithms  can  be  used  to  do  user  selection  practically. 

The  algorithms  on  user  selection  or  admission  control  were  discussed  in  [40]  [41].  A  greedy  deflation  approach,  i.e. 
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adding  user,  and  a  greed  inflation  approach,  i.e.  deleting  user,  were  mentioned.  Similarly  an  inflation  based  greedy 
algorithm  is  proposed  here  to  perform  user  selection: 

1.  Set  (S’ serve  “  ^  and  (jinit  “  ,  A/}. 

2.  Choose  i  £  CSinit?  formulate  Gi  =  (?serve  0  {z}  and  solve  the  optimization  problem  (4.132).  If  there  is  a  rank-1 
optimal  solution  P*,  then  pi  =  trace  (P*);  otherwise  pi  =  -hoo. 

3.  z*  =  argminjp^}  and  p*  =  min  {pi}.  If  p*  is  equal  to  +oo,  then  the  algorithm  is  terminated;  otherwise, 
G^serve  =  Gi  and  remove  z  from  Ginit-  If  Ginit  =  0,  then  the  algorithm  is  terminated;  otherwise  go  to  step  2. 

If  the  type  of  service  is  uni-cast,  different  information  will  be  sent  to  different  users  simultaneously.  For  example,  on 
the  current  bit  duration,  bit  1  is  sent  to  user  1  and  bit  0  is  sent  to  user  2.  All  the  users  can  be  divided  into  two  groups. 
Users  in  the  first  group  Gi  will  receive  bit  1  and  the  corresponding  peak  energys  should  be  equal  to  or  greater  than 
the  pre-defined  thresholds  for  bit  1.  While  users  in  the  second  group  G2  will  receive  bit  0  and  the  corresponding 
peak  energys  should  be  equal  to  or  less  than  the  lower  thresholds  for  bit  0.  Gi  and  G2  are  the  sets,  which  contains 
the  indices  of  users  in  the  first  group  and  second  group  respectively.  Gi  4-  G2  =  {1, 2, . . . ,  A/}  The  optimization 
problem  for  the  uni-cast  service  can  be  formulated  as  SDP, 

minimize  trace  (P) 
subject  to 
trace  (H^^P)  >  S 

mi 

TTti  £  Gi 

trace  (11^2^)  ^  ^m2  (4.133) 

7712  ^  G2 

trace  (S^F,S„P)  <  c„i 
z  =  l,2,...^ 

71  =  1, 2  . . . , 

where  is  the  minimum  peak  energy  needed  for  user  mi  to  receive  bit  1  and  5m2  is  the  maximum  peak  energy 
needed  for  user  m2  to  receive  bit  0. 

If  the  optimization  problem  (4.133)  can  be  solved  with  rank-1  optimal  solution  P*,  then  the  optimal  waveforms 
with  minimum  transmitted  power  can  be  obtained.  Otherwise,  there  is  no  solution  to  the  uni-cast  service  and  all  the 
users  can  not  be  served  simultaneously  and  satisfactorily.  User  selection  for  uni-cast  service  is  more  complex  than 
that  for  multi-cast  service.  The  latter  can  be  done  once  for  all.  While  the  former  need  be  done  for  each  bit  duration 
separately  due  to  the  dynamic  change  of  Gi  and  G2.  If  so,  the  connectivity  of  selected  user  can  not  be  guaranteed. 
For  example,  one  user  is  served  on  the  current  bit  duration  and  may  be  kicked  out  for  the  next  bit  duration.  Taking 
connectivity  into  account,  user  selection  for  uni-cast  service  should  also  be  performed  once  for  all.  Once  a  user  is 
selected,  this  user  will  be  served  to  end.  Meanwhile,  the  combination  of  Gi  and  g2  is  huge  in  terms  of  the  number  of 
users.  Thus  the  complexity  of  user  selection  algorithm  should  be  well  controlled.  A  deflation  based  greedy  algorithm 
is  proposed  here  to  perform  user  selection: 

1.  Set  Gserve  =  {1,  2, . .  . ,  A/}. 

2.  Randomly  choose  Gi  and  G2  such  that  Gi  +  G2  =  Gserve  and  Gi  n  G2  =  0. 

3.  Solve  the  optimization  problem  (4.133).  If  there  is  a  rank-1  optimal  solution  P*,  then  go  to  step  2;  otherwise, 
remove  one  user  from  Gi  or  G2  in  order  to  get  the  optimal  rank-1  solution  P*  with  the  minimum  transmitted  power. 
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If  such  a  user  exist,  remove  this  user  from  Gserve  and  go  to  step  2;  otherwise  randomly  remove  one  user  from  Gserve 
and  go  to  step  2, 

The  algorithm  will  be  terminated:  (1)  for  all  randomly  selected  patterns  of  Gi  and  G2,  the  optimization  problem 
(4.133)  will  has  a  corresponding  rank-1  optimal  solution  P*;  (2)  Gserve  =  0- 


4.2  A  Compressed  Sensing  Based  Ultra-Wideband  Communication  System 

4.2.1  Introduction 

Ultra-wideband  (UWB)  [3,42-44]  represents  a  new  paradigm  in  wireless  communication.  The  unprecedented  radio 
bandwidth  provides  advantages  such  as  immunity  from  flat  fading.  Two  primary  challenges  exist:  (1)  how  to  collect 
energy  over  the  rich  multipath  components;  (2)  extremely  high  sampling  rate  analog  to  digital  conversion  (A/D). 
Time  reversal  [45]  provides  a  promising  solution  to  the  first  problem  [46].  In  particular,  the  concept  of  time  reversal 
has  recently  demonstrated  in  a  real-time  hardware  test-bed  [47,48].  At  the  heart  of  time  reversal,  the  channel 
itself  is  exploited  as  a  part  of  the  transceiver.  This  idea  makes  sense  since  when  few  movements  exist,  the  channel 
is  time-invariant  and  reciprocal  [49].  In  principle,  most  of  the  processing  at  the  receiver  can  be  moved  to  the 
transmitter — ^where  energy  consumption  and  computation  are  sufficient  for  many  advanced  algorithms. 

A  natural  question  arises:  Can  we  move  the  hardware  complexity  of  the  receiver  to  the  transmitter  side  to  reduce  the 
sampling  rate  of  A/D  to  the  level  of  125  Msps — for  which  excellent  high  dynamic  range  commercial  solutions  are 
available?  Fortunately  compressed  sensing  (CS)  [50,51]  is  a  natural  framework  for  our  purpose. 

CS  has  been  used  to  UWB  communications  [52,53].  Our  major  contribution  is  to  exploit  the  channel  itself  as  part 
of  compressed  sensing,  through  waveform-based  pre-coding  at  the  transmitter.  Only  one  low-rate  A/D  is  used  at  the 
receiver.  We  also  have  demonstrated  (Fig.  4.3)  a  UWB  system  covering  the  3  GHz  -  8  GHz  frequency  band  that 
would,  if  with  the  conventional  sampling  technology,  take  decades  for  the  industry  to  reach. 

This  section  is  organized  as  follows.  Section  4.2.2  introduces  the  CS  theory  background  and  extends  CS  concept 
to  a  continuous  time  filter  based  architecture.  Section  4.2.3  describes  the  proposed  CS  based  UWB  system  together 
with  a  CS  based  channel  estimation  method.  Section  4.2.4  shows  the  simulation  results  and  section  4.2.5  gives  the 
conclusions. 


4.2.2  Compressed  Sensing  for  Communications 
Compressed  sensing  background 

Reference  [54]  gives  a  most  succinct  highlight  of  the  CS  principles  and  will  be  followed  here  for  a  flavor  of  this 
elegant  theory.  Consider  the  problem  of  reconstructing  an  W  x  1  signal  vector  x.  Suppose  the  basis  ^  =  [ipi, 
provides  a  AT-sparse  representation  of  x,  where  K  «  N;  that  is 

N-\  K 

n=0  /=1 


(4.134) 
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Here  x  is  a  linear  combination  of  K  vector  chosen  from  {n/}  are  the  indices  of  those  vectors;  {^n/}  are  the 
coefficients.  Alternatively,  we  can  write  in  matrix  notation 

x  =  ^e,  (4.135) 

where  0  =  InCS,x  can  be  reconstructed  successfully  from  M  measurements  and  M  «  N, 

The  measurement  vector  y  is  done  by  projecting  x  over  another  basis  $  which  is  incoherent  with  ^,i.e.  y  = 

The  reconstruction  problem  becomes  an  /i  —  norm  optimization  problem: 


^  =  arg  min  s,t,  y  =  (4.136) 

This  problem  can  be  solved  by  linear  programming  techniques  like  basis  pursuit  (BP)  or  greedy  algorithms  such  as 
matching  pursuit  (MP)  and  orthogonal  matching  pursuit  (OMP). 

When  applying  CS  theory  to  communications,  the  sampling  rate  can  be  reduced  to  sub-Nyquist  rate.  In  [55]  and  [56] 
a  serial  and  a  parallel  system  structure  were  proposed,  respectively.  Sampling  rate  can  be  reduced  to  less  than  20% 
of  Nyquist  rate.  However,  they  were  designed  for  signals  that  are  sparse  in  frequency  domain.  In  this  section  we 
propose  a  serial  system  structure  which  is  suitable  for  pulse-based  UWB  communications,  which  is  sparse  in  time 
domain.  The  analog-to-information  converter  (AIC)  structure  in  [55]  is  not  suitable  for  UWB  communications.  3  - 
8  GHz  UWB  signal  is  considered  as  an  example  in  describing  the  reasons: 


•  The  pseudo  noise  (PN)  chip  rate  requirement  for  PN  sequence  makes  it  difficult  for  UWB  signals,  which  must 
be  at  least  twice  the  maximum  signal  frequency.  For  example,  a  3  -  8  GHz  UWB  signal  needs  at  least  16  GHz 
chip  rate. 

•  The  multiplier,  which  can  be  a  mixer,  supporting  such  high  bandwidth  for  3  -  8  GHz  UWB  signal  is  difficult 
to  implement. 

•  The  system  is  time-variant.  Each  measurement  is  the  product  of  a  streaming  signal  and  a  changing  PN  se¬ 
quence.  This  requires  a  huge  amount  of  storage  space  and  complex  computation. 


In  this  section,  a  simple  architecture  that  is  suitable  for  UWB  signals  is  proposed  using  a  finite  impulse  response 
(FIR)  filter-based  architecture. 


Filter-based  compressed  sensing 


Random  filter  based  CS  system  for  discrete  time  signals  was  proposed  in  [57].  This  idea  can  be  extended  to  contin¬ 
uous  time  signals.  We  use  *  to  denote  the  convolution  process  in  a  linear  time-invariant  (LTI)  system.  Assume  that 
there  is  an  analog  signal  x(t),  t  G  [0,  T^]  which  is  A"-sparse  over  some  basis 


N-l 

n=0 


(4.137) 


where 
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{t)  =  [^'o  {t) ,  ^'1  (t) , '^N-i  («)]  >  (4.138) 

e  =  [eo,ei,...,eN-if  ■  (4.139) 

Note  that  there  are  only  K  non-zeros  in  0.  x{t)  is  then  fed  into  a  length-L  FIR  filter  h{t): 

L-l 

h{t)  =  Y^hiS{t-iTh),  (4.140) 

1=0 

where  T^,  is  the  time  delay  between  each  filter  tap. 

Theoutputy(t)  =  h{t)*x{t)  is  then  uniformly  sampled  with  sampling  period  T^.  follows  the  relation  T^/T/i  =  q, 
where  ^  is  a  positive  integer.  M  samples  are  eolleeted  so  that  M  •  =  [L  •  -(-  Tx\,  where  (L  •  T/^  Tx)  is  the 

duration  of  y{t). 

Now  we  have  the  down-sampled  output  signal  y{mTs)^  m  =  1, 2, A/  —  1: 


y  {mTs)  =  h  {mTs)  *  x{7nTs) 
=  h  {rnTs  —  r)  x  (r)  dr 

= 

Jo 


L-l 


hiS  {mTs  -  iTh  -  r) 

i=0 

L-l 

2  =  0 

=  ^X, 


X  (r)  dr 


where  is  a  quasi  —  Toeplitz  matrix  and 


(4.141) 


x=[x{0),x{Th),...,x{{M -l)qTh)f  =  '^'6, 


(4.142) 


^  (0) ,  ^  in) , ...,  'F  ((M  -  1)  qn)f  .  (4.143) 

A  quasi  —  Toeplitz  matrix  has  such  property:  each  row  of  ^  has  L  non-zero  entries  and  each  row  is  a  copy  of  the 
row  above,  shifted  right  by  q  places. 

Let  ym  =  y{mTs),  we  have 


y  =  [yo,yu-,yM-if  ■  (4.144) 

Combining  Equations  4.137,  4.138,  4.139,  4.141, 4.142, 4.143  and  4.144,  we  have: 


y  =  ^^0  =  OO 


(4.145) 
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Now  the  problem  becomes  recovering  N  x  1  vector  9  from  the  M  x  1  measurement  vector  y,  which  is  exactly  the 
same  as  the  problem  posed  in  Equation  4.136.  The  number  of  measurements  for  successful  recovery  depends  on  the 
sparsity  K,  duration  of  the  analog  signal  T^,  filter  length  L  and  the  incoherence  between  $  and  T/,  Numerical  results 
in  Section  4.2.3  show  that  when  x{t)  is  sparse  and  h{t)  is  a  PN  sequence,  9  can  be  reconstructed  successfully  with 
a  reduced  sampling  rate,  requiring  only  M  «  N  measurements.  Note  that  measurement  y  is  a  projection  from  x 
via  an  FIR  filter.  We  use  this  feature  to  design  our  proposed  system. 


4,2,3  Compressed  Sensing  Based  UWB  Communication  System 


SparM  UWB  loaHwant  FKar 

Saquonca  ,  Ganarttx 

9  ’  H*  i 


0  =  arg  s.t.  y^it>'¥9:^G9 


Figure  4.3:  The  system  architecture  of  the  proposed  CS  based  UWB  system.  The  communication  problem  of 
recovering  the  transmitted  information  can  be  modeled  as  a  CS  problem. 


Communication  system  architecture 

With  the  knowledge  of  Section  4.2.2  and  Section  4.2.2,  we  propose  a  CS-based  UWB  communication  system  which 
is  able  to  reduce  the  sampling  rate  to  1.25%  of  Nyquist  rate.  The  system  architecture  is  illustrated  in  Fig.  4.3. 
A  UWB  signal  is  transmitted  by  feeding  a  sparse  bit  sequence  through  a  UWB  pulse  generator  and  an  pre-coding 
filter.  Then,  the  received  signal  is  directly  sampled  after  the  channel,  using  a  low-rate  A/D  and  then  processed  by  a 
recovery  algorithm.  ^  is  the  projection  matrix  consisting  of  the  pre-coding  filter  and  the  channel.  It  can  be  noticed 
that  channel  itslef  is  part  of  the  projection  matrix  in  CS,  so  the  receiver  is  very  simple,  with  only  one  low-rate 
A/D  to  collect  measurement  samples.  Our  simulation  in  Section  4.2.4  shows  that  3-8  GHz  UWB  signals  can  be 
successfully  recovered  by  a  125  Msps  A/D. 

Ff -pulse  position  modulation  (PPM)  is  used  to  modulate  sparse  bit  sequence.  Each  PPM  symbol  is  X-sparse:  there 
are  N  positions  and  only  K  «  N  pulses  in  each  symbol,  as  illustrated  in  Fig.  4.4.  The  output  of  the  UWB  pulse 
generator  can  be  written  using  the  notations  in  Equation  4.137  and  4.138,  with  ^ri  (!)  =  P  {I  —  where  p{t)  is 
the  function  of  the  UWB  pulse  and  Tp  is  the  period  of  the  pulse.  Pre-coding  filter  and  channel  are  modeled  as  FIR 
filters,  with  combined  impulse  response  h{t)  —  f{t)  *  c{t),  where  f{t)  and  c{t)  are  the  impulse  response  for  the 
pre-coding  filter  and  the  channel,  respectively.  Here  h{t)  is  equivalent  to  the  h{t)  in  Equation  4.140.  The  received 
signal  y{t)  =  h{t)  *  x{t)  is  then  uniformly  sampled  by  an  A/D  with  sampling  period  Tg.  Similar  to  Equation  4.141 
and  4. 144,  the  down-sampled  measurements  form  the  M  xl  vector  y  —  ^^^9  =  0^,  where  ^  is  a  quasi  —Toeplitz 
matrix.  Now,  the  communication  problem  becomes  a  problem  of  estimating  9  from  M  «  N  measurements,  which 
is  again  identical  to  the  problem  described  as  Equation  4.136. 

The  success  of  recovery  relies  on  the  sparsity  K  and  the  incoherence  between  ^  and  Sparsity  is  easily  met  by 
controlling  the  transmitted  sequence.  In  our  simulation,  K  =  1,  which  means  that  there  is  only  one  pulse  in  PPM 
symbol.  The  incoherence  property  can  be  met  by  proper  selection  of  the  pre-coding  filter  f{t).  Simulation  results 
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Figure  4.4:  The  structure  of  the  transmitted  symbol.  This  symbol  is  Ff-sparse.  There  are  K  pulses  in  N  positions 
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Figure  4.5:  Block  diagram  of  channel  estimation 

show  that  if  f{t)  is  a  PN  sequence  whose  chip  rate  is  equal  to  the  bandwidth  of  the  UWB  pulse  p{t),  6  can  be 
successfully  recovered  using  recovery  algorithms.  So  far  the  discussion  is  in  baseband.  If  the  transmitted  UWB  is 
passband,  then  up-conversion  is  applied  after  the  pre-coding  filter.  PN  chip  rate  and  the  receiver  structure  remain  the 
same.  No  down-conversion  is  required  at  the  receiver.  For  example,  as  will  be  shown  in  the  simulation,  a  3  -  8  GHz 
UWB  pulse  requires  a  5  GHz  PN  chip  rate,  which  is  the  same  as  the  signal  bandwidth,  not  the  Nyquist  rate  of  the 
maximum  signal  frequency,  as  required  by  the  AIC  system.  A/D  at  the  receiver  directly  samples  the  received  signal, 
without  doing  down-conversion. 

The  number  of  measurements  M  and  sampling  rate  are  related  and  determined  by  the  length  of  the  combined  filter 
h{t).  If  h{t)  is  long,  the  received  signal  is  “spread  out”  in  the  time  domain,  therefore  sufficient  measurements  can 
be  made  under  a  lower  sampling  rate. 

Channel  estimation 

After  down-sampling,  y  is  processed  at  the  receiver  with  ©  using  BR  In  constructing  0,  /(t),  c{t)  and  are 
required.  f{t)  and  are  fixed  and  can  be  considered  as  prior  knowledge  at  the  receiver.  The  channel,  c{t), 
however,  needs  to  be  estimated.  A  CS  based  channel  estimation  method  is  proposed.  A  3  -  8  GHz  channel  can  be 
estimated  by  a  500  Msps  A/D. 

Similar  to  Equation  4.140,  the  UWB  channel  can  be  modeled  as: 

L-l 

c(0  =  (4.146) 

1=0 

The  channel  estimation  block  diagram  is  illustrated  in  Fig.  4.5.  A  UWB  probing  pulse  p{t)  *  f{t)  is  transmitted 
to  “probe”  the  channel,  where  p{t)  is  a  UWB  pulse  and  f{t)  is  a  PN  sequence.  At  the  receiver,  sub-Nyquist  rate 
A/D  collects  M  uniform  measurements.  This  process  can  be  represented  as  y  =  D  |  (c(t)  *  f{t)  *  p(t)),  where 
D  I  denotes  a  down-sampling  factor  of  [N/M\  and  y  denotes  the  measurement  vector.  Since  the  system  is  LTI,  an 
alternative  block  diagram  can  be  drawn  as  Fig.  4.6.  Then,  y  =  D  |  *  p{t))  *  c{t)).  In  matrix  notation,  y  = 

0c,  where  ©  is  sl  quasi— Toeplitz  matrix  derived  from  f{t)*p{t)  and  c  —  [cq,  c\  , ...,  The  channel  estimation 

problem  is  to  get  c  from  measurements  y,  which  is  identical  to  the  CS  problem  described  in  Equation  4.136. 

Successful  recovery  requires  c  to  be  sparse  and  the  incoherent  property  of  measurement  matrix  0  [5 1].  Indoor  UWB 
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Figure  4.6:  An  equivalent  block  diagram  of  channel  estimation 


Figure  4.7:  Time  domain  channel  derived  from  VNA  measurement.  The  sparsity  of  this  channel  is  50. 


channel  is  sparse  and  PN  sequence  structured  0  has  the  incoherent  property.  PN  chip  rate  should  be  the  same  as  the 
bandwidth  of  the  channel  under  estimation.  We  use  simulation  to  show  the  estimation  result. 

First,  we  need  to  set  up  the  real  channel  c{t)  as  the  estimation  target.  Vector  network  analyzer  (VNA)  is  used  to  get 
the  real  indoor  channel  coefficient  c.  3  -  8  GHz  channel  is  measured  by  VNA  with  1  MHz  frequency  step  and  128 
averages.  c(i)  (Fig.  4.7)  is  derived  from  the  VNA  data  using  CLEAN  algorithm  with  a  rectangular  window.  There 
are  about  50  non-zero  entries  in  c.  PN  chip  rate  is  5  GHz  and  length  of  f{t)  is  1  /is.  Baseband  Gaussian  UWB  pulse 
p{t)  has  5  GHz  bandwidth.  Since  the  measured  channel  is  in  passband,  up-conversion  is  applied  after  the  PN  filter. 
At  the  receiver,  500  Msps  A/D  is  used  to  get  measurements.  BP  is  then  used  to  get  the  estimated  vector  c  with  the 
knowledge  of  f{t),  p{t)  and  y  only.  Additive  white  Gaussian  noise  (AWGN)  is  added  at  the  received  samples  as 
y  =  0c  -h  re,  where  w  is  the  noise  vector.  Basis  pursuit  denoising  (BPDN)  is  used  to  solve  the  recovery  problem 
with  noise.  Fig.  4.8  (a)  shows  the  estimation  result  and  Fig.  4.8  (b)  shows  the  zoomed  in  result.  It  can  be  seen  that 
though  c  is  a  little  noisy,  all  major  paths  in  c  perfectly  match  to  c.  Only  the  amplitudes  are  slightly  different. 

We  will  use  c  as  “perfect  estimation”  to  form  the  measurement  matrix  ^  and  the  noisy  c  as  “imperfect  estimation” 
to  form  the  measurement  matrix  ^  in  the  CS-based  UWB  communication  system  symbol  error  rate  simulation. 
Interestingly,  though  imperfect  estimation  is  noisy,  the  symbol  error  rate  is  similar  to  perfect  estimation. 


4.2.4  Simulation  Results 

Since  UWB  channel  is  stable  when  few  movements  exist  in  the  indoor  environment,  we  assume  that  channel  is 
time-invariant  during  the  channel  estimation  and  communication  process. 

In  the  simulation,  each  symbol  has  only  one  pulse  in  256  candidate  positions,  containing  8  bits  information.  This 
is  a  special  case  of  the  PPM  symbol  illustrated  in  Fig.  4.4,  with  A"  =  1.  More  pulses  can  be  used  to  increase 
the  information  per  symbol.  Since  the  purpose  of  this  work  is  to  recover  6,  not  maximizing  the  data  rate,  the  case 
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Figure  4.8:  (a)  Channel  estimation  result,  (b)  Zoomed  in  version  of  the  result. 


Figure  4.9:  Recovery  of  0  using  125  Msps  A/D,  perfcct/impcrfect  channel  estimation.  No  noise  is  added. 

when  if  >  1  is  not  simulated.  The  UWB  pulse  generator  produces  a  5  GHz  bandwidth  Gaussian  pulse.  The  pre¬ 
coding  filter  is  a  PN  sequence  with  128  ns  duration  and  5  GHz  chip  rate.  Then  the  signal  is  modulated  with  a  5.5 
GHz  sinusoidal,  up-converting  to  the  3  -  8  GHz  frequency  band.  Measured  channel  c{t)  in  4.7  is  used.  Due  to 
the  delay  spread  of  the  channel  and  the  length  of  PN  sequence,  the  received  signal  y(t)  is  spread  out  over  256  ns. 
A  256  ns  guard  period  is  added  between  symbols  to  avoid  intersymbol  interference  (ISI).  At  the  receiver,  perfect 
synchronization  is  assumed.  125  Msps,  250  Msps  and  500  Msps  sampling  rates  are  simulated  to  evaluate  the  symbol 
error  rate  VS  SNR  per  symbol  performance.  Since  measurements  are  made  in  a  512  ns  period,  the  relating  numbers 
of  measurements  M  for  125  Msps,  250  Msps  and  500  Msps  are  64,  128  and  256,  respectively.  BPDN  provided 
by  [58]  is  used  as  the  reconstruction  algorithm.  The  position  with  maximum  amplitude  in  0  is  compared  with  the 
position  in  If  two  positions  are  exactly  the  same,  the  symbol  is  considered  as  reconstructed  successfully.  20000 
simulations  were  performed  for  each  SNR  plot. 

Fig.  4.9  shows  the  reconstruction  result  under  125  Msps  sampling  rate  without  any  additive  noise.  0  reconstructed 
from  perfect/imperfect  channel  profile  is  compared  with  original  0.  From  Fig.  4.9,  we  can  see  that  0  from  imperfect 
channel  profile  is  a  bit  more  noisy.  However,  the  position  of  maximum  amplitude  is  exactly  the  same  as  the  one  in 
0.  Therefore,  the  transmitted  symbol  is  recovered  successfully.  After  20000  simulations,  no  errors  can  be  found. 
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Figure  4.10:  Simulation  results  of  symbol  error  rate  vs  SNR  per  symbol  at  the  receiver. 

Fig.  4.10  shows  the  results  under  AWGN.  Perfect/imperfect  channel  estimation  and  125  Msps/250  Msps/500  Msps 
sampling  rate  are  simulated.  We  can  see  that  higher  sampling  rate  provides  better  symbol  error  rate  performance. 
This  is  because  more  measurements  are  collected  under  higher  sampling  rate.  It  is  also  noticed  that  under  same 
sampling  rate,  the  symbol  error  rate  of  perfect  channel  estimation  and  imperfect  channel  estimation  have  almost  no 
difference.  From  the  500  Msps  sampling  rate  curve,  we  can  see  that  the  symbol  error  rate  is  0  when  SNR  is  over  -6 
dB,  in  20000  simulations. 

Many  interesting  topics  are  raised  after  the  simulation.  Does  the  measurement  matrix  ^  and  'F  satisfy  the  incoher¬ 
ence  property  in  theory?  What  is  the  relationship  for  sampling  rate,  SNR  and  symbol  error  rate?  Why  the  imperfect 
channel  estimation  shows  similar  performance  with  perfect  channel  estimation?  How  to  achieve  synchronization 
with  the  system?  Further  effort  needs  to  be  done  to  explain  these  questions. 


4.2.5  Conclusions 


Our  proposed  approach  is  to  exploit  the  projection  matrix  with  channel  itself  and  a  waveform-based  pre-coding  at 
the  transmitter.  Taking  the  channel  as  part  of  CS  results  in  a  very  simple  receiver  design,  with  only  one  low-rate  A/D. 
The  pre-coding  is  implemented  in  a  natural  way  using  an  FIR  filter.  The  concept  has  been  demonstrated,  through 
simulations,  using  real-world  measurements.  Realistic  channel  estimation  is  also  considered.  The  philosophy  is  to 
trade  computation  complexity  for  hardware  complexity,  and  move  receiver  complexity  to  the  transmitter. 

This  work  is  Just  the  beginning  of  the  pre-coded  CS.  Future  work  includes  reduction  of  algorithm  complexity.  Much 
quicker  algorithms  are  required  for  real-time  applications  such  as  UWB  communications.  Deterministic  CS  [59] 
will  be  considered  in  the  context  of  UWB  channel. 

Traditional  pre-coding  optimizes  the  system  in  the  digital  domain.  The  waveform-based  pre-coding  optimizes  the 
transceiver  in  both  mixed  signal  and  digital  domain.  CS  provides  a  natural  framework.  It  may  be  more  natural  to 
combine  waveform-based  pre-coding  with  sampling  innovation  [60],  since  pre-coding  can  be  used  to  reduce  the 
degrees  of  freedom — thus  the  sampling  rate. 
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4.4  UWB  MIMO  Testbed 


For  more  than  decades,  multiple-antenna  technology  has  been  an  important  field  of  communications  research  [61]. 
It  has  became  a  key  technology  for  wireless  local  area  networks  (WLANs),  wireless  metropolitan  area  networks 
(WMANs),  and  cellular  mobile  communication  systems  (3G,  4G),  because  they  promise  greater  coverage,  higher 
data  rates,  and  improved  link  robustness  by  adding  a  spatial  dimension  in  addition  to  the  time,  the  frequency,  and  the 
code  dimensions  [62].  Intensive  theoretical  researches  have  been  performed  on  multiple-antenna  system  often  based 
on  simplified  models  of  reality.  There  are  few  practical  implementations  to  verily  the  theoretical  gain  of  a  multiple- 
antenna  system,  especially  for  the  wideband  multiple-antenna  system  that  introduces  much  more  challenges  over  the 
narrowband  system.  The  accurate  performance  investigation  over  realistic  imperfect  channels  have  motivated  the 
design  of  a  real  time  radio  testbed.  In  the  authors’  group,  MISO  UWB  characteristics  and  performances  have  been 
well  studied  and  various  results  have  been  demonstrated  in  [63]  [64].  Further  more,  a  simple  2  by  1  MISO  UWB 
testbed  based  on  time  reversal  system  has  already  been  presented  in  last  quarterly  report,  it  employs  an  FPGA  for 
fast  parallel  processing  of  transmitted  data,  and  the  operation  and  performance  of  this  MISO  UWB  radio  testbed  are 
investigated  in  the  case  of  wideband  waveform  level  precoding. 


4.4.1  2-By-l  UWB  MISO  Testbed  Description 

The  whole  testbed  is  a  module-based  system  as  described  in  [65].  Figure  4.11  shows  the  system  architecture, 
where  there  are  two  transmit  channels  at  transmitter  side  and  only  one  receive  channel  at  receiver  side.  As  the  most 
powerful  device  for  parallel  signal  processing,  FPGA  plays  a  critical  role  in  the  system,  all  waveform  algorithms  will 
be  implemented  in  FPGA.  The  two  digital  outputs  with  both  I  and  Q  phase  from  the  FPGA  boeird  will  be  sent  to  two 
identical  dual  channel  digital-to-analog  converter  (DAC)  boards,  which  are  capable  of  1  Gsamples/sec  and  14  bits 
of  precision,In  the  FPGA  design  of  the  transmitter,  there  are  two  waveform  generators  modules  which  support  both 
I-phase  and  Q-phase  modulation.  The  receiver  is  energy  detection  based  as  a  low-complexity  reception  technique 
which  eliminates  the  need  for  channel  estimation  and  precise  synchronization. 


4.4.2  Potential  UWB  MIMO  Testbed 

Another  RF  receiver  chain  will  be  needed  to  evolve  the  current  2-by-l  MISO  testbed  to  a  2-by-2  MIMO  testbed, 
the  baseband  processing  unit  also  need  to  be  able  to  receive  two  RF  signals.  The  RF  solution  is  the  wideband 
demodulator  ADL5380  and  dual-channel  digital  gain  trim  amplifier  AD8366,  which  are  described  in  appendix  and 
both  are  from  Analog,Inc. 

For  the  baseband  processing  solution,  we  have  already  purchased  the  EV8AQ160  QUAD  ADC  evaluation  board 
manufactured  by  E2V,  Fig  4.12  shows  the  picture.  The  Quad  ADC  is  constituted  by  four  8-bit  ADC  cores  which 
can  be  considered  independently  (four  channel  mode)  or  grouped  by  two  cores  (two-channel  mode  with  the  ADCs 
interleaved  two  by  two  or  one-channel  mode  where  all  four  ADCs  are  all  interleaved). 

The  EV8AQ160-EB  Evaluation  Board  is  very  straightforward  as  it  implements  e2v  EV8AQ160  Quad  8-bit  1.25 
Gsps  ADC  device,  Atmel  ATMEGA128  AYR,  SMA  connectors  for  the  sampling  clock,  analog  inputs  and  reset 
inputs  accesses  and  2.54  mm  pitch  connectors  compatible  with  high-speed  acquisition  system  probes.  Thanks  to  its 
user-friendly  interface,  the  EV8AQ160-EB  Kit  enables  to  test  all  the  functions  of  the  EV8AQ160  Quad  8-bit  1.25 
Gsps  ADC  using  the  SPI  connected  to  a  PC. 


Transmitter 


Receiver 

Figure  4.1 1:  MISO  UWB  testbed  architecture 


Figure  4.12:  Picture  of  EV8AQ160-EB  Evaluation  Board 


Chapter  5 


Waveform  Diveristy 


This  project  drives  our  investigation  in  waveform  diversity  and  helps  reveal  the  importance  of  as  well  as  insight 
into  waveform  diversity.  Waveform  diversity  is  a  key  research  topic  currently  in  wireless  communications,  radar, 
sensing,  and  imaging.  Waveform  should  be  designed  or  optimized  according  to  different  requirements  or  objectives, 
and  it  should  be  adapted  or  diversified  dynamically  to  the  operating  environment  in  order  to  achieve  a  performance 
gain  [66].  For  example,  the  waveform  should  be  designed  to  carry  more  information  to  the  receiver  in  terms  of 
capacity.  If  the  energy  detector  is  employed  at  the  receiver,  the  waveform  should  be  optimized  such  that  the  signal 
energy  within  the  integration  window  at  the  receiver  reaches  the  maximum  [1]  [34]  [67]  [37].  For  navigation  and 
geolocation,  an  ultra  short  waveform  should  be  used  to  increase  the  resolution.  For  multi-target  identification, 
the  waveform  should  be  designed  so  that  the  radar  returns  can  bring  more  information  back.  In  clutter  dominant 
environments,  maximizing  the  target  energy  and  minimizing  the  clutter  energy  should  be  considered.  In  electronic 
warfare,  anti-jamming  is  a  critical  task  for  the  outcome  of  the  warfare.  Though  currently  anti-jamming  is  performed 
at  the  receiver,  if  the  transmitter  knows  some  kind  of  information  about  the  jamming  at  the  receiver,  for  example, 
the  second  order  statistics  about  jamming,  the  transmitter  can  elaborately  design  the  waveform  and  help  the  receiver 
cancel  the  jamming  signal  and  reduce  system  power  consumption. 

Multiple  Input  Single  Output  (MISO)  system  is  one  type  of  multi-antenna  system  in  which  there  are  multiple  an¬ 
tennas  at  the  transmitter  and  one  antenna  at  the  receiver.  MISO  system  can  explore  the  spatial  diversity  and  execute 
the  transmitter  beamforming  to  focus  energy  on  the  desired  direction  or  point  and  avoid  interference  to  other  radio 
systems.  It  is  well  known  that  waveform  and  spatially  diverse  capabilities  are  made  possible  today  due  to  the  advent 
of  lightweight  digital  programming  waveform  generator  [68]  or  arbitrary  waveform  generator.  Waveform  diversity 
can  also  be  applied  to  wideband  system.  A  variety  of  waveforms  with  different  time- frequency  characteristics  can  be 
explored  in  wideband  systems  to  support  high  data  rate  and  physieal  layer  security  for  radio  communications  as  well 
as  high  resolution  and  accuracy  for  radars.  Waveform  design/optimization  for  wideband  multiple-antenna  systems  is 
documented  in  [69].  From  theoretical  point  of  view,  the  contribution  of  [69]  can  be  summarized  as  follows:  (1)  the 
equivalent  baseband  waveforms  are  designed  for  the  passband  system.  (2)  Different  waveforms  for  different  trans¬ 
mitter  antennas  are  jointly  optimized  to  reach  global  optimum  performance.  At  the  receiver,  the  received  signals 
from  different  transmitter  antennas  are  combined  over  the  air  such  that  the  receiver  antenna  sees  only  a  single  copy 
of  signal  from  the  transmitter  no  matter  how  many  transmitter  antennas  exist.  In  order  to  achieve  this  over-the-air 
phase  coherency  for  the  passband  signals,  precise  synchronization  between  these  signals  has  to  be  achieved  at  the 
transmitter  [68]. 

In  the  context  of  cognitive  radio,  a  cognitive  radio  user,  or  secondary  user  can  coexist  with  other  secondary  and 
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primary  users.  However,  waveform  design/optimization  gives  us  more  flexibilities  to  design  the  radio  and  may  make 
the  coexistence  even  better.  Consider  a  scenario  where  wideband  secondary  users  with  low  spectrum  density  are 
allowed  to  overlay  with  narrowband  primary  users.  In  addition  to  the  traditional  communication  objectives  and 
constraints,  spectral  mask  constraint  at  the  transmitter  and  the  interference  cancellation  at  the  receiver  can  be  jointly 
considered  in  optimizing  the  waveform.  Spectral  mask  constraint  is  imposed  on  the  secondary-user  transmitted 
waveform  such  that  the  transmitter  has  no  or  limited  interference  to  the  primary  users,  while  some  interference 
cancellation  scheme  implemented  at  the  secondary-user  receiver  cancels  the  narrowband  interference  caused  by  the 
primary  users. 

Though  the  thought  of  waveform  diversity  for  the  radar  systems  can  be  traced  back  to  the  World  War  II,  due  to 
the  computational  capability  and  hardware  limitations,  many  waveform  design  algorithms  can  not  be  implemented 
into  the  radar  systems  [66]  for  many  years.  Nowadays,  these  bottlenecks  are  knocked  down  and  waveform  diver¬ 
sity  becomes  a  hot  spot  afresh  in  the  radar  society.  Time  reversal  or  phase  conjugating  waveform,  colored  wave¬ 
form,  sparse  and  regular  non-uniform  Doppler  waveform,  non-circular  waveform,  and  so  on,  are  handled  based  on 
advanced  mathematics  tools  in  [70].  New  trends  in  coded  waveform  design  for  radar  applications  are  presented 
in  [71].  The  modem  semidefinite  programming  (SDP)  and  the  novel  algorithm  on  Hermitian  matrix  rank-one  de¬ 
composition  are  exploited  to  perform  code  selection,  which  can  maximize  the  detection  performance  and  control  the 
Doppler  estimation  accuracy  as  well  as  the  similarity  with  a  pre-fixed  radar  code  [71].  Meanwhile,  another  force  to 
propel  the  research  on  waveform  diversity  is  the  introduction  of  cognition  to  the  radar  systems.  A  cognitive  radar 
can  actively  learn  about  the  environment,  and  the  whole  radar  system  forms  a  dynamic  feedback  loop  involving  the 
transmitter,  environment  and  receiver  [72].  Waveform  diversity  will  play  an  important  role  in  cognitive  radar.  The 
radar  transmitter  can  adjust  its  illumination  (waveform)  in  an  intelligent,  effective,  adaptive  and  robust  manner,  tak¬ 
ing  into  account  the  results  of  learning  and  perception  on  the  environment  [72].  Thus,  the  philosophy  of  sequential 
testing  [73]  can  be  embraced  under  the  umbrella  of  cognitive  radar  smoothly.  Several  rounds  of  testing  illuminations 
will  be  used  until  the  belief  of  correct  decision  is  made.  The  waveform  and  the  transceiver  scheme  for  each  round 
of  test  can  be  adjusted  according  to  the  results  of  the  previous  tests.  For  example,  adaptive  compressing  sensing 
(CS)  [74]  gives  us  the  hint  on  this  research  field. 

The  previous  theoretical  researches  on  waveform  diversity  do  not  take  into  account  the  robustness  seriously.  There 
are  several  reasons  for  that:  (1)  the  theory  of  robust  optimization  was  not  that  mature  in  the  old  days;  (2)  Robust¬ 
ness  makes  waveform  diversity  more  complex;  (3)  the  research  on  waveform  diversity  was  only  limited  to  computer 
simulation.  As  the  theory  of  robust  optimization  becomes  mature  and  the  bottlenecks  of  computation  and  imple¬ 
mentation  (e.g.,  arbitrary  waveform  generator)  are  knocked  down,  robust  optimization  for  waveform  diversity  (or 
robust  waveform  diversity)  will  bring  more  attentions.  Robustness  truly  bridges  the  gap  between  theoretical  work 
and  practical  situation. 

Obviously  waveform  diversity  is  implemented  at  the  transmitter  side.  But  waveform  diversity  should  have  broader 
meaning  and  significance.  First  of  all,  any  type  of  signal  processing  in  the  waveform  level  at  the  receiver  should  also 
be  included  into  the  waveform  diversity  framework.  One  common  signal  processing  is  the  receiver  beamforming 
including  the  narrowband  beamforming  and  wideband  beamforming.  Robust  receiver  beamforming  is  dealt  with 
in  [39]  [75].  The  uncertainty  comes  from  the  mismatch  of  steering  vector  and  from  the  estimation  error  of  the 
sampled  covariance  matrix  of  interference  plus  noise.  If  implementation  uncertainty  is  taken  into  account  [76], 
robust  optimization  will  play  an  important  role  in  waveform  diversity. 


Chapter  6 


Wideband  Digital  Beamforming 


6.1  Wideband  Multichannel  RF  Front-End  For  Beamforming 

The  multichannel  receiver  is  a  building  core  for  any  MIMO  system  in  either  communication  or  radar.  Especially 
from  military  perspective,  there  has  been  a  critical  and  constant  need  for  enhanced  multichannel  receivers  for  array 
and  radar  applications.  More  channels  and  bandwidth  are  required,  while  less  power  consumption,  lower  cost, 
and  smaller  sizes  are  expected.  There  is  also  a  trend  that  the  receivers  are  more  and  more  digital,  thanks  to  the 
advance  in  semiconductor  technologies.  Having  seen  so  much  progress  in  hardware,  especially,  dramatic  increase 
of  sampling  rate  in  digital  processing,  newer  type  of  receivers  that  take  advantages  of  all  possible  new  concepts  and 
most  advanced  components/devices  are  expected.  A  variety  of  available  technologies  have  to  be  considered  and 
compromising  has  to  be  made  among  a  large  number  of  factors. 


6.2  Overall  Architecture 

Illustrated  in  Fig.6. 1  is  a  high  level  multichannel  receiver  architecture  containing  three  functional  blocks:  analog 
front-end  (tuner),  digitizer  and  digital  backend.  In  choosing  the  sampling  rate  and  determining  digital  computational 
load,  we  will  attempt  to  be  a  little  aggressive,  expecting  that  digital  processing  power  will  increase  while  the  cost 
will  drop  continuously.  There  are  a  number  of  options  in  selecting  a  digital  processing  platform.  We  prefer  to  use 
an  array  of  high  performance  FPGAs  such  as  Xilinx  Virtex-5  or  Virtex-6  series,  in  conjunction  of  a  DSP  engine. 
This  hybrid  signal  processing  platform  is  not  only  flexible,  but  compromises  both  real-time  filtering  and  highly 
computational  beamforming,  thus  it  is  particularly  attracting  for  wideband  digital  beamforming  prototyping. 


Figure  6.1 :  High  level  multichannel  receiver  architecture. 
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6.3  Analog  Front-end  (Tuner) 

In  general  there  are  two  popular  options  for  radio  front-end  architecture:  heterodyne  (or  superheterodyne)  and  zero- 
IF  (or  direct-conversion,  homodyne)  architectures.  The  heterodyne  architecture  has  better  performance  if  compared 
with  the  other,  and  therefore  it  has  been  the  most  popular  receiver  architecture  sine  it  was  invented  by  Edwin  H. 
Armstrong  in  1918.  On  the  other  hand,  although  the  concept  of  zero-IF  reception  was  first  proposed  by  F.M. 
Colebrook  as  early  as  in  1924  (6  years  before  the  heterodyne  receiver  was  invented),  it  was  not  until  1947  that 
it  was  put  into  practice,  the  first  application  being  measurements  in  telephony.  Since  90 ’s,  zero-IF  architecture 
has  been  especially  promoted  in  software  defined  radio  (SDR)  community.  Compared  with  the  traditional  receiver 
architecture,  its  main  advantages  are  the  following: 

(1)  The  problem  of  image  rejection  is  overcome,  so  that  the  receiver  preselection  portion  is  simplified  and  frequency 
planning  is  unnecessary. 

(3)  The  fact  that  most  signal  processing  is  done  at  low  frequencies  implies  using  LPFs  as  channel-select  filters. 

(4)  Amplification  is  mainly  at  the  baseband  stage,  hence  power  is  saved.  In  general,  a  zero-IF  receiver  contains  less 
hardware  and  has  very  high  level  of  integration. 

While  being  attractive,  the  zero-IF  receiver  architecture  has  many  drawbacks:  DC  offset,  I/Q  imbalance  (or  mis¬ 
matching)  with  analog  quadrature  downconverter,  even-order  distortion,  self-mixing,  1/f  (or  clicker)  noise,  and  local 
oscillator  (LO)  leakage.  In  order  to  combine  the  advantages  of  both  the  heterodyne  and  zero-IF  architectures,  a 
low-IF  receiver  architecture  was  proposed  in  later  90 ’s.  This  architecture  indeed  keeps  a  high  level  of  integration 
and  eliminates  most  of  the  drawbacks  associated  with  the  zero-IF  receivers,  but  it  introduces  image  rejection  require¬ 
ment.  When  the  RF  frequencies  are  lower  than  2  GHz  (HF,  VHF,  UHF  and  L  bands),  the  rather  low  IF  frequency 
image  rejection  is  difiicult  to  implement  and  image  rejection  mixer  techniques  are  required  instead  of  an  image  re¬ 
jection  filter.  In  practical  image  rejection  mixers,  the  amplitude  and  phase  mismatches  between  I  and  Q  channels  are 
ultimate  limitation  of  the  image  rejection  ratio  [77].  In  addition,  designing  and  implementing  ultra  wideband  image 
rejection  mixers  is  extremely  difficult. 

Although  the  low-IF  structure  has  many  advantages,  it  may  not  be  feasible  For  UWB  applications,  due  to  the  limi¬ 
tation  of  the  digitizer’s  sampling  rate.  The  zero-IF  structure  will  be  the  first  choice  at  present.  In  the  future,  low-IF 
can  be  considered  if  the  digitizer’s  sampling  rate  significantly  increases  or  the  signal  bandwidth  is  much  less  than 
500  MHz. 


6.4  Multichannel  Digitizer  with  FPGA  Based  Filters 


We  have  chosen  Sundance  SMT702  Digitizer  boards  to  build  a  wideband  beamformer.  Fig.  6.2  shows  the  4-channels 
system  architecture  consists  of  5  SMT702  boards,  each  channel  with  both  I  phase  and  Q  phase  channel,  capable  of 
handling  3Gsps  sampling.  In  this  architecture,  the  Sundance  SHB  cables  will  be  used  to  connect  all  the  boards.  High 
speed  data  filtering  and  combining  are  relied  on  the  on-board  FPGA  chips.  A  DSP  engine  which  is  not  shown  in  Fig. 
6.2  will  take  care  of  sophisticated  beamforming  computation. 

The  SMT702  is  a  PXI  Express  Peripheral  Module  (3U),  which  integrates  two  3  Gsps  8-bit  ADCs,  a  clock  circuitry, 
2  banks  of  1GByte  DDR2  Memory  each  and  a  Xilinx  VirtexS  LXl  lOT-3  FPGA,  under  the  3U  format.  The  good 
news  for  this  product  is  it  can  be  standalone  and  with  a  number  of  general  I/Os. 

Both  ADC  chips  are  identical  and  can  produce  3  Giga-samples  per  second  each,  with  an  8-bit  resolution.  The 
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NoM;  Each  converter  Include*  tmncetion  of  4  LSBs  or  2:1  muNiplexrg 


Figure  6.2:  Wideband  beamforming  architecture  wiht  5  SMT702  high  speed  digitizers. 


74 


CHAPTER  6.  WIDEBAND  DIGITAL  BEAMFORMING 


manufacturer  is  National  Semiconductor  and  the  part  number  is  ADCO 83000.  Analog-to-Digital  converters  are 
clocked  by  circuitry  based  on  a  PLL  coupled  with  a  VCO  in  order  to  generate  a  low-jitter  signal.  The  full  bandwidth 
is  3GHz.  Each  ADC  integrates  settings  such  as  offset  and  scale  factor,  which  makes  the  pair  of  ADC  suitable  to  be 
combined  together  in  order  to  make  a  6GSPS  single  Analog  to  Digital  converter.  This  will  be  subject  to  a  specific 
application  note. 

The  Virtex5  FPGA  is  responsible  for  controlling  all  interfaces,  including  PXI  (32-bit)  and  PXIe  (up  to  8  lanes  not 
all  PXI  Express  controller  support  8  lane),  as  well  as  routing  samples.  The  FPGA  fitted  on  the  SMT702  is  part  of 
the  Virtex-5  familly  from  Xilinx,  XC5VLX1  lOT-3  (fastest  speed  grade  available). 

Two  DDR2  memory  banks  are  accessible  by  the  FPGA  in  order  to  store  data  on  the  fly.  An  SHB  connector  is 
available  in  order  to  transfer  data/samples  to  an  other  Sundance  module  (SMt7 12  for  instance).  All  analog  connectors 
on  the  front  panel  are  SMA. 

For  software  development,  the  SMT7026  is  an  efficient,  ready  to  use,  host  side  interface  for  the  SMT702.  It  allows 
us  to  control  the  SMT702  from  the  host  as  well  as  to  exchange  data  between  the  host.  It  can  configure  the  FPGA 
from  the  Host,  transfer  data  from  the  SMT702  to  the  Host  and  even  provide  a  C++  type  interface  to  the  FPGA 
module. 


6.5  Wideband  Beamforming:  An  Example 

The  architecture  of  wideband  beamforming  is  shown  in  Fig.  6.3.  Assume  there  are  AI  antennas  in  phased  array 
radar.  The  distance  between  antennas  is  d.  The  mutual  coupling  in  phased  array  radar  is  not  considered  here.  6  (t) 
is  the  signal  in  the  far  field  of  phased  array  radar  and  impinges  on  it  from  the  angle  ^o-  The  characteristics  of  RF 
chains  related  to  different  branches  are  given  by  hm  {t)  E  [0, T]. 

Because  of  the  limitation  of  ADC,  the  sampled  data  for  each  branch  is  hm  [n,  6]  the  time  duration  of  which  is 
T  -h  Fractional  delay  method  is  used  to  get  hm  [n,  6\  from  hm  {t)  E  [0,  T]. 

The  goal  of  the  task  is  to  design  filter  Wm  [n]  for  each  branch  in  the  digital  domain  to  form  the  wideband  beam  under 
different  requirements.  After  summation  of  each  branch’s  response, 

M 

^  [n,  9]  *  Wm  M  (6.1) 

m=l 

and 

X  [rif,  0]  =  FFT  {x  [n,  ^]}  (6.2) 

The  beam  of  phased  array  radar  can  be  defined  as, 

b  [n/,  9]  =  X  [nj,  9]  (6.3) 

Assume  the  vector  representations  of  Wm  [n],  x[n^9],  x[nf,9]  and  b[nf^9]  are  w^,  x  [^],  xj  [^]  and  b  [^].  The 
toeplitz  matrix  representation  of  hm  is  [0].  F  is  the  discrete-time  Fourier  transform  operator.  Assume, 


H[^]  =  [Hi[^]  H2[^]  •••  Km  [9]] 


(6.4) 
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Steering  end  pattern  control 


Widabend  digital  beemforming  implementation  option  2  (fractionel  deleys  era  handled  by  e  LUT). 
Advantage:  no  need  for  fractional  dalay  saction;  disadvantaga:  larga  numbar  of  combinations  to  store. 

Figure  6.3:  Wideband  digital  beamforming  implementation  option. 
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and 

w  =  [wf  w"  •  •  •  (6.5) 

where  (•)^  denotes  the  conjugate  transpose  of  matrix. 

Eqn.  6.1  and  Eqn.  6.2  can  be  reformualted  and  combined  as, 

X/  [0]  =  FH  [^]  w  (6.6) 


Assume  A  [^]  =  FH  [0],  thus, 


b  [0]  =  A  [^]  w 


(6.7) 


If  P  [^o]  is  the  desired  array  response,  i.e.  beampattem  at  then  the  ||b  [^o]  —  P  [^0111^2  should  be  minimized.  If 

02 

there  is  the  broad  null  from  6i  to  O2,  then  ^  [^]  ll/o  should  be  minimized. 

e=ei 


Assume  =  [— f ,  f  ]  —  {^0}  ~  [^1,  ^2],  then,  the  optimization  problem  can  be  formulated  as, 

02 

mm||b[0o]-pNllf,+Ai  ^  \\h[e]\\l+X,  ^  ||b[0]||2 

0=01  eene 


Let 


02 

/  (W)  =  ||b  [0o]  -  P  [^o] \\l  +X1YI  11^  ^  ||b  [0] \\l 


0  =  01 


(6.8) 


(6.9) 


The  optimization  problem  (6.8)  is  the  unconstrained  convex  optimization  problem,  the  optimal  solution  can  be 
obtained  by  solving  the  following  conjugate  gradient  equation. 


Thus, 


9/(w) 

dw* 


-:0 


02 

w  =  I  A  [Oof  A  [0o]  +  Ai  ^  A  [ef  A  [0]  +  A2  ^  A  [Of  A  [0]  |  A  [Oo]^  p  [^o] 

0=01  0Gn^ 


(6.10) 

(6.11) 


The  simulation  setting  is  A/  =  16,  =  0^,  0i  =  —61®,  ^2  =  —  bl®,  Ai  =  1,  A2  =  0.0001,  d  —  20cm  and  the 
number  of  taps  in  each  filter  is  1 60.  The  modulus  of  each  entry  in  p  [^0]  is  1  and  the  phase  is  linear  phase.  The 
simulated  wideband  beam  is  shown  in  Fig.  6.4. 


Beam  (dB) 
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Figure  6.4:  Wideband  Beamforming 
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Part  IV 
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Wideband  Coherent  RF  Front-End-Device  Test 


To  build  a  coherent  receiver  front-end,  a  wideband  quadrature  demodulator  ADL5380  from  Analog  Devices  has 
been  chosen.  The  performanee  has  been  tested  using  its  evaluation  board  ADL53  80-29 A-EVALZ  for  operation  from 
3GHz  to  4GHz.  As  a  building  block,  a  baseband  amplifier  AD8366  from  Analog  Devices  has  also  been  introduced. 

The  ADL5380  is  a  high  performance  quadrature  I-Q  demodulator  that  covers  an  RF  input  frequency  range  from  400 
MHz  to  6  GHz,  With  typical  values  NF  =  13dB,  IPldB  =  12dBm  and  IIP3  =  31  dBm  at  2.5GHz,  the  demodulator 
offers  good  dynamic  range  suitable  for  the  demanding  infrastructure  direct-conversion  requirements.  The  ADL5380 
provides  a  typical  voltage  conversion  gain  of  4dB,  helping  minimize  the  gain  requirements  of  the  receiver  front  end. 

Fig.  6.5  shows  the  diagram  of  the  evaluation  board  schematics,  where  the  differential  RF  inputs  provide  a  well- 
behaved  broad-band  input  impedanee  of  SOD  and  be  driven  from  a  1 : 1  balun  for  the  best  performanee.  For  the  LO 
input  interface,  a  1 :1  RF  balun  that  converts  the  single-ended  RF  input  to  differential  signal  is  used. 


RFX 


Figure  6.5:  Diagram  of  ADL5380  Evaluation  Board. 
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The  baseband  output  of  the  evaluation  board  can  be  configured  to  Single-Ended  output  or  full  differential  outputs. 
The  original  baseband  outputs  are  from  pins  IHI,  ILO,  QHI  and  QLO  in  differential  format.  For  single-ended  output, 
the  balun  TCM9-1  converts  a  differential  high  impedance  IF  output  to  a  single-ended  output,  and  resistors  R13x  to 
R18x  are  populated  for  an  appropriate  balun  interface.  When  loaded  with  5012,  this  balun  presents  a  45012  load  to  the 
device.  By  populating  resistors  R2x  to  R5x  with  012  and  not  populating  R13x  to  R18x,  the  the  TCM9-1  transformer 
is  bypassed  to  allow  for  differential  baseband  outputs.  The  default  baseband  outputs  are  differential  not  single-ended 
as  the  datasheet  says. 

Setup  for  the  quadrature  demodulation  is  shown  in  Fig.  6.6.  At  the  transmitter  side,  the  in-phase  and  quadrature 
baseband  signals  are  generated  by  the  DAC’s  waveform  memory,  then  the  signals  are  up-converted  by  the  quadrature 
modulator  ADL5375.  The  Local  oscillator  provide  a  3.3  GHz  carrier  frequency  for  both  the  modulator  and  demod¬ 
ulator,  where  a  power  splitter  is  used.  The  modulator  and  demodulator  are  connected  by  a  SMA  cable  to  ensure 
appropriate  RF  signal  level.  A  few  test  parameters  are  listed  below. 


Power 

supplies 


Figure  6.6:  Testing  setup  for  ADL5380  Evaluation  Board. 


1)  LO  signal  generator  output:  3.3GH2,  9dBm. 

2)  LO  power  splitter  input:  3.3Ghz,  4.83dBm,  Vpp  at  1 .104v,  (Because  of  cable  loss) 

3)  LO  ADL5375  input:  3.3Ghz,  -0.5dBm,  Vpp  at  592mv,  (  Power  splitter  attenuation  and  cable  loss) 

4)  DAC  CLK  input:  550MHz. 

5)  DAC  waveforms:  sine_1024_14b.vec  and  ramps^l024_14b.vec. 

6)  RF  ADL5375  modulator  output:  Vpp  at  552mv. 

7)  Baseband  ADL5380  demodulator  output:  VPP  at  272mv. 

Fig.  6.7  shows  the  modulator’s  output  when  transmitting  sine  waveform  both  on  I-phase  channel  and  Q-phase 
channel.  Fig.  6.8  is  the  corresponding  demodulation  results  when  transmitting  both  sine  waveforms  on  I-phase 
channel  and  Q-phase  channel.  Fig.  6.9  is  the  corresponding  demodulation  results  when  transmitting  sine  waveform 
on  I-phase  channel  and  ramp  waveform  on  Q-phase  channel.  All  these  results  show  that  the  demodulator  is  working 
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properly. 


Figure  6.7:  Modulator’s  output  when  transmitting  both  sine  waveforms  on  I-phase  channel  and  Q-phase  channel. 

The  companion  AD8366  is  the  industry’s  first  dual-channel  digital  gain  trim  amplifier  (DGTA)  and  is  designed  to 
enable,  integrate  and  cost  reduce  direct  conversion  radio  receivers.  The  combined  two-chip  solution,  as  Fig.  6.10 
shows,  can  reduce  the  active  component  count  by  60  percent  within  a  radio  design,  providing  considerable  board 
area  and  bill  of  material  (BOM)  savings. 

The  AD8366  has  a  0.25  dB  step  over  4.5  to  20.5dB  gain  range,  it  is  ideally  suited  for  analog  I/Q  quadrature  ADC 
driving,  maintaining  excellent  quadrature  accuracy  for  direct  conversion  radio  receivers.  The  fine  gain  control  allows 
flexible  and  precise  tuning  of  gain-critical  to  multi-standard  and  wideband  radio  receivers.  Additionally  the  AD8366 
features  include  adjustable  output  common  mode  for  matching  to  ADC  input  ranges,  and  DC  output  offset  correction 
loops  to  minimize  demodulator  DC  offsets.  The  AD8366  is  fully  specified  for  direct  conversion  receiver  designs 
capable  of  handling  the  widest  of  signal  bandwidths  with  the  lowest  distortion.  For  example,  the  AD8366  supports 
the  second  and  third  harmonic  distortion  >88  dBc  for  10  MHz  signals  at  max  gain. 
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Figure  6.8:  Demodulation  results  when  transmitting  both  sine  waveforms  on  I-phase  channel  and  Q-phase  channel. 


Figure  6.9:  Demodulation  results  when  transmitting  sine  waveform  on  I-phase  channel  and  ramp  waveform  on 
Q-phase  channel. 
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Figure  6.10:  Architecture  of  ADL5380  and  AD8366  combined  solution. 
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